Databases and Information Systems

Databases and Information Systems

The core challenge in the exploration and further development of Databases and Information systems lies in the area of Data Management. Data Management is concerned with the handling of data throughout its life cycle: from data acquisition to analysis to the generation of further data resulting from the analysis.

Data is a key input for all kind of data-driven solutions including digital systems, advanced analytics – e.g., Machine Learning – as well as data-driven applications – e.g., Recommender Systems and Question Answering – just to name some examples. The success of these applications relies on fast access to high-quality data. Therefore, the investigation of Data Management techniques is fundamental to support large-scale and meaningful data-driven solutions.

Challenges

Despite the developments in Data Management in the last decades, due to the increasing amount of user-generated data and the rise of novel data models and computer architectures, the area of data management is constantly facing new opportunities and challenges. Therefore, novel data processing techniques that exploit the latest technology need to be investigated to successfully support data-driven solutions.

Key areas

The research group of Databases and Information Systems at RUB focuses on devising novel techniques for managing decentralized data efficiently and effectively, in particular, addressing the following research problems:

query optimization: reducing the execution time and amount of resources used to evaluate queries over large data sources.

federated query processing: source selection and query planning techniques to efficiently execute queries over decentralized, autonomous sources composing a federation.

data quality: detection of quality issues including incomplete or incorrect statements in data sources.

Projects and partners

The projects of the team led by Prof. Dr. Maribel Acosta has contributed to the state-of-the-art in query processing over large, federated data on the web modelled as knowledge graphs, which include the publication of the following systems:

  • ANAPSID: adaptive query engine for federations of SPARQL endpoints
  • nLDE: adaptive query engine for Triple Pattern Fragments
  • HARE: enhancing query answer completeness with crowdsourcing