Frequencies are based on a corpus of about 104 M tokens 'clean' version (140 M tokens 'dirty' version) compiled from the Web in Feb-Mar 2006.
These and other web datasets on this site represent snapshots of an evolving corpus. Contents and counts vary from dataset to dataset and are subject to change. ver. 4 July 2007