Journal of Information Technology & Tourism, Vol 11, No 3 (2009)

Ontology-Based Information Extraction from Tourism Websites

Christina Feilmayr, Stefan Parzer Parzer, Birgit Pröll Pröll

Abstract


The enlarging amount of semi-structured and unstructured data on heterogeneously designed tourism Web sites creates a need for information extraction (IE) mechanisms for semi-automatic data acquisition in order to build tourism recommender systems or tourism Web portals. In this paper we analyze heterogeneity aspects of individually maintained accommodation Web sites and discuss the applicability of different IE types and techniques for this domain. We then develop a rule- and ontology based IE approach and discuss the components of our prototype crawler.  Finally, we discuss some relevant issues which emerged during the development and evaluation of the prototype.

Keywords: e-tourism, information extraction, GATE (General Architecture for Text Engineering)

 

 




Journal of Information Technology & Tourism (ISSN: 1098-3058) is hosted at MODUL University Vienna and published by Cognizant.