1. Recall-aware Information Extraction
Textual information extraction is a core step in the construction of many knowledge bases. In most cases, extracted facts are accompanied with a precision, i.e., a confidence in them being correct. The recall of extraction is then usually influenced by adapting the accepted precision.
- John has two children, Bob and Mary –> A typical way to name ALL children
- John brought his children Bob and Mary to school –> There could reasonably be other children that are e.g. too old or too young to be brought to school.
The technical work would consist of extending an existing information extraction system such as ClausIE towards adding a recall value to fact sets, and training it using distant supervision.
2. Metrics for Relative Completeness
- : Assessing the Completeness of Entities in Knowledge Bases, Albin Ahmeti, Simon Razniewski and Axel Polleres, ESWC, Portoroz, Slovenia, 2017
- : Doctoral Advisor or Medical Condition: Towards Entity-specific Rankings of Knowledge Base Properties, Simon Razniewski, Vevake Balaraman and Werner Nutt, ADMA, 2017
- : Sematch: Semantic similarity framework for Knowledge Graphs, Ganggao Zhu, Carlos A. Iglesias, Knowledge-Based Systems, 2017
3. Multilingual Coverage of Wikipedia
Wikipedia pages in different languages about the same entity often vary widely in size and content. The goal of this project is to quantify and qualify these differences, and to visualize them via a user script.
On the technical level, the idea is to pursue two approaches: (1) Multilingual topic modelling, to discover topics covered more/less in one article or the other, and (2) Interlinking, which can be further structured based on the information available in Wikidata. The results should be turned into a Wikipedia plugin, similar to Recoin.
4. Recommender System for Gliding
Gliding is both a recreational and competitive sport. On good days, glider pilots can be in the air up to 8 hours or more, during which they can cover significant distances (800km and more). Most glider pilots upload their competitive flights to an online platform (Onlinecontest.org), where flights are daily listed and ranked using points that are based on the covered distance and the performance of the aircraft.
To achieve high points on a given day, glider pilots have to carefully choose a task (flight route) that, given their plane, skills and the weather conditions, allows them to cover the maximal distance. All of weather conditions, skills and plane can make a huge difference, as overestimating the weather conditions may lead to not completing the task, and as it is not uncommon that experienced pilots travel twice or more the distance that beginners travel.
The goal of this project is to develop a prototype of a gliding task recommendation system, which takes into account the factors mentioned above. The core component of the prototype will be the similarity function for tasks, which will then be used in a standard recommender systems framework (i.e., collaborative filtering or content-based filtering).
5. Exploiting existential information
Existential information, i.e., knowledge about numbers of facts that hold in reality (e.g., MPII has 5 departments) are recent addition to knowledge bases that classically focus on facts that link entities (e.g., D5 is a department of MPII). The goal of this work is to exploit existential information as derived by  in some part of the KB lifecycle, i.e., either information extraction, KB consolidation, or question answering.