Tightly Packed Tries: How to Fit Large Models into Memory, and Make them Load Fast, Too

Download
  1. (PDF, 242 KB)
AuthorSearch for: ; Search for: ; Search for:
TypeArticle
Proceedings titleProceedings. Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009)
ConferenceWorkshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009), Boulder, CO, USA, June 05, 2009
Pages3139; # of pages: 9
AbstractWe present Tightly Packed Tries (TPTs), a compact implementation of read-only, compressed trie structures with fast on-demand paging and short load times. We demonstrate the benefits of TPTs for storing n-gram back-off language models and phrase tables for statistical machine translation. Encoded as TPTs, these databases require less space than flat text file representations of the same data compressed with the gzip utility. At the same time, they can be mapped into memory quickly and be searched directly in time linear in the length of the key, without the need to decompress the entire file. The overhead for local decompression during search is marginal.
Publication date
LanguageEnglish
AffiliationNational Research Council Canada (NRC-CNRC); NRC Institute for Information Technology
Peer reviewedYes
NRC number52533
NPARC number16435915
Export citationExport as RIS
Report a correctionReport a correction
Record identifier9eb37696-ddab-4265-9f10-e5ff2f83779a
Record created2010-11-24
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)