Keynotes and ICDT Invited Talk

Keynote 1

Title: The Quest for Knowledge

Katja Hose

Aalborg University, Denmark

Abstract:

Throughout the entire history of mankind, humans have always strived to acquire new knowledge. In a way, striving for knowledge is still what motivates researchers across all modern research disciplines. Within Computer Science, especially in the past couple of years, we have witnessed an increasing interest in structured knowledge in the form of graphs, so-called knowledge graphs; not only in academia but also in industry. Read More

Keynote 2

Title: Explainability Queries for ML Models and its Connections with Data Management Problems

Pablo Barceló

Pontifical Catholic University of Chile, Chile

Abstract:

In this talk I will present two recent examples of my research on explainability problems over machine learning (ML) models. In rough terms, these explainability problems deal with specific queries one poses over a ML model in order to obtain meaningful justifications for their results. Read More

Both of the examples I will present deal with “local” and “post-hoc” explainability queries. Here “local” means that we intend to explain the output of the ML model for a particular input, while “post-hoc” refers to the fact that the explanation is obtained after the model is trained. In the process I will also establish connections with problems studied in data management. This with the intention of suggesting new possibilities for cross-fertilization between the area and ML.

The first example I will present refers to computing explanations with scores based on Shapley values, in particular with the recently proposed, and already influential, SHAP-score. This score provides a measure of how different features in the input contribute to the output of the ML model. We provide a detailed analysis of the complexity of this problem for different classes of Boolean circuits. In particular, we show that the problem of computing SHAP-scores is tractable as long as the circuit is deterministic and decomposable, but becomes computationally hard if any of these restrictions is lifted. The tractability part of this result provides a generalization of a recent result stating that, for Boolean hierarchical conjunctive queries, the Shapley-value of the contribution of a tuple in the database to the final result can be computed in polynomial time.

The second example I will present refers to the comparison of different ML models in terms of important families of (local and post-hoc) explainability queries. For the models, I will consider multi-layer perceptrons and binary decision diagrams. The main object of study will be the computational complexity of the aforementioned queries over such models. The obtained results will show an interesting theoretical counterpart to wisdom’s claims on interpretability. This work also suggests the need for developing query languages that support the process of retrieving explanations from ML models, and also for obtaining general tractability results for such languages over specific classes of models.

About the speaker:

Full Professor at Pontificia Universidad Católica de Chile, where he also acts as Director of the Institute for Mathematical and Computational Engineering. He is the author of more than 80 technical papers, has chaired ICDT 2019, will be chairing ACM PODS 2022, and is currently a member of the editorial committee of Logical Methods in Computer Science. From 2011 to 2014 he was the editor of the database theory column of SIGMOD Record. His areas of interest are database theory, logic in computer science, and the emerging relationship between these areas and machine learning.

Keynote 3

Title: Data Profiling – A look back and a look forward

Felix Naumann

Hasso Plattner Institute (HPI), Germany

Abstract:

Data profiling is the act of extracting many different types of metadata from a given dataset. This research area has recently thrived, due to (i) its simple problem statements, such as “discover all key candidates”, (ii) the high computational complexity of the problems, which are often exponential in the number of columns, (iii) the manifold opportunities for optimizations, such as apriori-inspired pruning or data sampling, and (iv) the various application areas for data profiling results, such as query optimization and data cleaning. Read More

Keynote 4

Title: Comparing Apples and Oranges: Fairness and Diversity in Ranking

Julia Stoyanovich

New York University, USA

Abstract:

Algorithmic rankers take a collection of candidates as input and produce a ranking (permutation) of the candidates as output. The simplest kind of ranker is score-based; it computes a score of each candidate independently and returns the candidates in score order. Another common kind of ranker is learning-to-rank, where supervised learning is used to predict the ranking of unseen candidates. For both kinds of rankers, we may output the entire permutation or only the highest scoring k candidates, the top-k. Set selection is a special case of ranking that ignores the relative order among the top-k.Read More

ICDT Invited Talk

Title: What Makes a Variant of Query Determinacy (Un) Decidable?

Jerzy Marcinkowski

University of Wrocław, Poland

Abstract:

Suppose there is a database we have no direct access to, but there are views of this database available to us, defined by some queries Q_1, Q_2 ,. . . Q_k. And we are given another query Q. Will we be able to compute Q only using the available views?

The above question, calling it “the question of determinacy”, sounds almost philosophical. One can easily imagine a bearded man in himation chained to the wall of a cave watching the views projected on the wall and pondering whether, from what he is able to see, the reality can be faithfully reconstructed.Read More

For us it is a database theory question though. And a really well motivated one, with motivations ranging from query evaluation plans optimization (where we prefer a positive answer) to privacy issues (where the preferred answer is negative).

Query determinacy is a broad topic, with literally hundreds of papers published since the late 1980s. This talk is not going to be a “survey” (which would be impossible, within a one hour time frame, and with this speaker), but rather a personal perspective of a person somehow involved in recent developments in the area.

First I will explain how, in the last 30+ years, the question of determinacy was formalized. There are many parameters here: obviously one needs to choose the query language of the queries Q_i and the query language of Q. But – surprisingly – there is also some choice as to what the word “to compute” actually means in this context.

Then I will concentrate on the variants of the decision problem of determinacy (for each choice of parameters there is one such problem–Q_1 , Q_2 , . . . Q_k and Q constitute the instance, and the question is whether Q_1 , Q_2 , . . . Q_k determine Q) and I will talk about how I understand the mechanisms rendering different variants of determinacy decidable or undecidable. This will be on a slightly informal level. No new theorems will be presented, but I think I will be able to show simplified proofs of some of the earlier results.

About the speaker:

Jerzy Marcinkowski received his PhD degree, in 1993 , from the Mathematical Department of the University of Wrocław and spent his entire academic career (with the exception of a three semesters long post-doc at the Laboratoire d’I Informatique Fondamentale de Lille in France) working for the University of Wrocław, where he is currently serves as the head of the Department of Computer Science.

His modus operandi is to try to attack well-abstracted, long standing open problems in logic in computer science, including database theory. Database theory people may know him for his work on DATALOG single rule program boundedness (1996), database repairs and consistent query answering, including prioritized repair (with Jan Chomicki and Sławek Staworko; 2000s) and, more recently, for his work on the Chase algorithm, where he proved (with his student Tomasz Gogacz) that all-instances chase termination is undecidable (2014). His main topic of interest in the last five years has been query determinacy.

Keynotes and ICDT Invited Talk

Keynote 1

Title: The Quest for Knowledge

Katja Hose

Keynote 2

Title: Explainability Queries for ML Models and its Connections with Data Management Problems

Pablo Barceló

Keynote 3

Title: Data Profiling – A look back and a look forward

Felix Naumann

Keynote 4

Title: Comparing Apples and Oranges: Fairness and Diversity in Ranking

Julia Stoyanovich

ICDT Invited Talk

Title: What Makes a Variant of Query Determinacy (Un) Decidable?

Jerzy Marcinkowski

Platinum Sponsor

Bronze Sponsor