Word frequency data from the Corpus del Español

9 Dic

IMAGENWord frequency data

from the Corpus del Español

This site (Word frequency data. Corpus of Contemporary American English: http://www.wordfrequency.info/) contains what we believe is the most accurate frequency data of English, and it comes in a number of different formats (see samples: 100,000 and 60,000 word lists).

For the 5,000-60,000 word lists, you can download a simple word list, frequency by genre, or as an eBook or a printed frequency dictionary. For the 100,000 word list, you can see detailed frequency information for many genres in several different corpora. In addition to word frequency data, you can also download up to 155 million n-grams, and 4.3 million collocates

In addition to frequency lists for English, we also have what we believe are the most accurate frequency lists for Spanish (http://www.wordfrequency.info/spanish.asp), containing the top 20,000 lemmas / words in the language. The Spanish data is based on the 20 million words from the 1900s in the 100 million word Corpus del Español, which is the only corpus of Spanish that is 1) large 2) balanced across genres (spoken, fiction, newspaper, academic), and 3) which is accurately tagged for part of speech and lemma (which is necessary to create a frequency dictionary).

The data is available in a number of formats.

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s

A %d blogueros les gusta esto: