Tutorial: Commonsense knowledge extraction and consolidation

Speaker: Simon Razniewski, Max Planck Institute for Informatics, Germany

Venue and date: KI 2020, September 22, 2020, Bamberg, Germany

Description of its topic and goal. Machine-readable commonsense knowledge (CSK) is fundamental for automated reasoning about the general world, and relevant for downstream applications such as question answering and dialogue. In this tutorial, we focus on the construction and consolidation of large repositories of commonsense knowledge. After briefly surveying crowdsourcing approaches to commonsense knowledge compilation, in the main parts of this tutorial, we investigate (i) automated text extraction of CSK and relevant choices for extraction methodology and corpora, and (ii) knowledge consolidation techniques, that aim to canonicalize, clean, or enrich initial extraction results. We end the tutorial with an outlook on application scenarios, and the promises of deep pretrained language models.

Video recordings

  1. Motivation [link to video, 17 minutes]
  2. Knowledge representation [link to video, 12 minutes]
  3. Crowdsourcing [link to video, 13 minutes]
  4. Text Extraction [link to video, 30 minutes]
  5. Transformer-based techniques [link to video, 13 minutes]
  6. Consolidation [link to video, 14 minutes]
  7. Summary and Outlook [link to video, 8 minutes]

Live event:

  • There will be no live presentation, attendants can watch the video recordings linked above at their own pace.
  • I will be online from 11:00-12:00 to answer questions [zoom link].
  • You can also reach out to me at srazniew@mpi-inf.mpg.de.

Target audience and expected prerequisite knowledge. The target audience of this tutorial are researchers and practitioners of artificial intelligence areas such as automated reasoning, planning, question answering or dialogue, who are interested to learn about techniques to acquire structured knowledge to bootstrap their methods. The tutorial will provide them with an overview of extraction and consolidation paradigms which would help them to acquire commonsense knowledge for their own specific use cases, and it provides a survey of existing repositories that may be relevant for reuse.
Attendees are expected to have basic knowledge in knowledge representation. No previous knowledge of natural language processing is expected.

Organizer’s background. The organizer has considerable experience in knowledge base construction, and has more recently ventured into extraction and reasoning methods for commonsense knowledge. Two particularly relevant projects are Quasimodo [1] and Dice [2]. The former focuses on salient commonsense knowledge extraction from question-corpora such as Reddit and Google Autocompletion, and provides the most extensive collection of general-world knowledge to date. The latter is a project aimed fighting sparsity and incoherence in existing commonsense knowledge repositories, utilizing a logical reasoning framework and taxonomical information in order to consolidate and complete existing repositories.

Bibliography

[1] Commonsense Properties from Query Logs and Question Answering Forums, Julien Romero, Simon Razniewski, Koninika Pal, Jeff Z. Pan, Archit Sakhadeo, Gerhard Weikum, CIKM, 2019

[2] Joint Reasoning for Multi-Faceted Commonsense Knowledge, Yohan Chalier, Simon Razniewski and Gerhard Weikum, AKBC, 2020

[3] R. Speer and C. Havasi, “Representing General Relational Knowledge in ConceptNet 5,” LREC, 2012.

[4] B. Dalvi, N. Tandon, and P. Clark, “Domain-Targeted, High Precision Knowledge Extraction,” TACL, 2017.

[5] N. Tandon, G. de Melo, F. Suchanek, and G. Weikum, “WebChild : Harvesting and Organizing Commonsense Knowledge from the Web,” WSDM, 2014.

[6] P. Jansen, “Multi-hop Inference for Sentence-level TextGraphs: How Challenging is Meaningfully Combining Information for Science Question Answering?,” TextGraphs, 2018.

[7] M. Sap et al., “Atomic: An atlas of machine commonsense for if-then reasoning,” AAAI, 2018.

[8] A. Bosselut, H. Rashkin, M. Sap, C. Malaviya, A. Celikyilmaz, and Y. Choi, “COMET: Commonsense Transformers for Automatic Knowledge Graph Construction,” ACL, 2019.

Further references upon request.