Project description

Today’s Social Web allows people in a community of practice to post their own experiences in a diversity of content repositories such as blogs, forums, Q&A websites, etc. However, today there is no real support in finding and reusing these rich collections of personal experience. Current search functions available merely consider experience as text to be indexed as any other text and searched and found as any other document. The objective of the project cluster Extraction and Case-Based Processing of Experiential Knowledge from Internet Communities is the analysis, the development, and the experimental application and evaluation of new knowledge-based methods, particularly from case-based reasoning (CBR), information extraction, and machine learning to extract and process experiences in Internet communities. The project cluster consists of three projects led by the University of Marburg, the University of Trier, and the Goethe University Frankfurt. All have chosen the field of cooking as a joint application domain to demonstrate and to empirically evaluate the developed methods. The EVER project is funded by the 'Deutsche Forschungsgemeinschaft' (DFG) from 2011 to 2014.

Related publications:

  • Spät A., Keppler M., Schmidt M., Kohlhase M., Lauritzen N., and Schumacher P. “GoetheShaker - Developing a rating score for automated evaluation of cocktail recipes.” In ICCBR Workshop Proceedings, 2014.
  • Schumacher P., and Minor M. “Towards a Trace Index Based Workflow Similarity Function.” In KI 2014: Advances in Artificial Intelligence - 37th Annual German Conference on AI, Stuttgart, Germany, September 22-26, 2014. Proceedings, volume 8736, of Lecture Notes in Computer Science, pages 225–230, 2014.
  • Homburg, T., Schumacher P., and Minor M. “Towards workflow planning based on semantic eligibility” In 28. Workshop "Planen, Scheduling und Konfigurieren, Entwerfen".
  • Schumacher P., and Minor M. “Extracting control-flow from text” In Proc. of the 2014 IEEE 15th International Conference on Information Reuse and Integration, pages 203 –210, San Francisco, California, USA, 2014. IEEE.
  • Schumacher P., Minor M., and Schulte-Zurhausen E. “On the Use of Anaphora Resolution for Workflow Extraction” In Integration of Reusable Systems, of Advances in Intelligent Systems and Computing, pages 151–170.
  • Schumacher P., Minor M., and Schulte-Zurhausen E. “Extracting and Enriching Workflows from Text” In Proceedings of the 2013 IEEE 14th International Conference on Information Reuse and Integration, 285–292, 2013.
  • Schumacher P., and Minor M. “Hybrid Extraction of Personal Workflows” In Konferenzbeiträge Der 7. Konferenz Professionelles Wissenmanagement. Passau, Germany, 2013.
  • Schumacher P., Minor M., Walter K., and Bergmann R. “Extraction of Procedural Knowledge from the Web” In Workshop Proceedings: WWW’12. Lyon, France, 2012.

Repository of cooking workflows:

We used one of our prototypes to extract a set of 1844 workflows from cooking recipes. We make the repository available under the CC BY-SA 4.0 license. Please note that the workflows have been created automatically by the use of natural language processing approaches. We cannot give any guarantees of the quality or correctness of the workflows. The workflows are created to be used within the CAKE framework. We refer to the website of the framework for information about the workflow format.

Download repository of cooking workflows.
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Source code:

We developed a workflow extraction framework which can be used to develop applications in arbitrary domains. We are publishing our code under the Apache License Version 2.0 Our framework uses several third party libraries which are not published under the Apache License Version 2.0, therefore we need to publish the code separately. Please refer to the following list which shows the different packages and the respective license. The software needs a linux system to run. It was tested on Fedora 19. The installation script should also work on Debian-like systems but we did not tested it. Eventually you need to install the necessary packages to compile sundance manually (gcc-c++, glibc-static and libstdc++-static).

Published under a custom license (Please refer to the license included in the archive):

  • Sentence UNDerstanding ANd Concept Extraction Download 

Published under Apache License Version 2.0:

Quickstart for cooking domain:

  1. Please run on a Linux system (tested on Fedora 19)
  2. Check if the following packages are installed: gcc-c++, glibc-static and libstdc++-static
  3. Download: Ever and Sundance
  4. Run installation script
  5. Download a small set of cooking recipes for testing purpose.
  6. Open terminal and navigate to the installation folder
  7. Run: java -jar ever.jar --input <ABSOLUT_PATH_TO_INPUT_FOLDER> --output <ABSOLUT_PATH_TO_OUTPUT_FOLDER>

Use:

Basically the framework needs two components to run. The first one is the java component. It contains the control for the pipeline, the workflow java datastructure and a set of initial filters. With all the filters delivered it is possible to extract workflows of the cooking domain. Advanced filters like statistical anaphora resolution or numeric value and unit handling are not included because these were published under the GPL, therefore these need to be downloaded on there own. The second component is the NLP tool SUNDANCE(Sentence UNDerstanding ANd Concept Extraction). Currently SUNDANCE is mandatory becausee the extraction framework interprets the results from SUNDANCE. A different NLP tool can be used, but then the output must be mapped to the output format of SUNDANE and the current filters are tailored to SUNDANCE. SUNDANCE is published under a custom license which is included in the archive. The advanced filters can be used like the simple filters. Just remind that they are published under a different license which contains copy-left. We provide an installation script which does the basic installation. For any further information, have a look at the java docs.

To develop a pipeline for a new domain, the best entry point is to study the simple cooking pipline (ever.simpleCooking.CookingPipelne) it shows how different filters are used to setup a pipeline. Existing filters can be reused. The order of the filters matters. Especially the CaseLoader filter usually is the first one. To develop new filters you should have a look at existing ones. To start the extraction you need to run ever.pipeline.Main.

Input format:

A simple xml based input format is used to delivers recipes or howtos to the application. A sample file can be found here. It is basically a list of ingredients and steps. We crawled a huge number of different recipes and howtos from multiple websites (allrecipes.com or wikihow.com). Unfortunately we are not allowd to share them but we can share a small set of cooking recipes which are from our personal recipe collection.