Wissensrohstoff Text - Eine Einführung in das Text Mining

Most of the world's knowledge is described in digitally available texts. These texts represent an important source of knowledge – but how can this knowledge be extracted? Learn in this updated and expanded new edition of the first German textbook on this topic how digital text can be prepared, processed and used in applications with the help of text mining.

The authors

Professor Dr. Chris Biemann is scientific director of the House of Computing and Data Science, and heads the Language Technology Group at the Department of Informatics, both at the University of Hamburg.
Professor Dr. Gerhard Heyer headed the Natural Language Processing chair at the Department of Computer Science of Leipzig University.
Professor Dr. Uwe Quasthoff headed the project "Deutscher Wortschatz" at the Natural Language Processing group of Leipzig University.


On this page you find various resources that are used or referenced in the book. This includes the text data used and the ASV Online Toolbox, where you can try out the procedures explained in the book directly in your browser.

Data

German news corpus (Germany) 2019, different sizes

German Web corpus (Germany) 2019, different sizes

More downloads

Glossary

Book glossary

The book's glossary can be downloaded here (in German).

Tools

ASV Online Toolbox

The ASV Online Toolbox is a modular collection of tools for the exploration of written language data and allows to use many of presented techniques directly in your browser.

ASV Toolbox

The ASV Toolbox is a modular collection of tools for the exploration of written language data. It was created at the NLP Group, Leipzig University and is not actively developed anymore.

Download at the Language Technology Group, University of Hamburg: ASV Toolbox