From HTML Documents to Web Tables and Rules

Download
  1. (PDF, 812 KB)
AuthorSearch for: ; Search for: ; Search for:
TypeArticle
ConferenceThe Eighth International Conference on Electronic Commerce (ICEC 2006), August 14-16, 2006., Fredericton, New Brunswick, Canada
Subjectdata extraction; data record alignment; rule-based languages
AbstractWe present a browser-extending Semantic Web extraction system that maps HTML documents to tables and, where possible, to rules. First, the basic data extractor ViPER distills and reorganizes semi-structured information into a tabular data structure, which can again be browsed and/or submitted to further machine processing. Second, exemplifying the latter, the extended knowledge extractor Rex ViPER mines the resulting tables for structural properties and functional dependencies. Rules are generated to obtain a more compact and manageable, often also enriched, knowledge representation. The resulting fully structured information, RuleML-serialized facts and rules, can be stored along with the orginal documents, queried by rule engines such as OO jDREW and FLORID, and interchanged between Web Services. Thus Rex ViPER contributes to automating the construction of a machine-processable Semantic Web.
LanguageEnglish
AffiliationNRC Institute for Information Technology; National Research Council Canada
Peer reviewedNo
NRC number49310
NPARC number5764332
Export citationExport as RIS
Report a correctionReport a correction
Record identifier4b3ab6b5-8cb0-4ed8-838b-7ed43067e340
Record created2009-03-29
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)