Frequency
of Use Analysis
Modern Prose
The analysis is
made from a 9,635,811 character (nine point six million) source
data file which contains total of 1,711,549 words plus spaces,
punctuation and formatting. This data was compiled from selected
modern novels by various authors available free to download in
e-text form from the Baen
Free Library
publisher's website.
The e-texts were saved as text only files and edited to
remove all non-author information. They were then amalgamated into one large
file and processed for analysis (see Note).
See the "Frequency
of Use" percentages
page for a detailed analysis of the keying percentages from which
the above graphs were made. Note how punctuation is used more
then some letters.
These are the "Top
100" words with accumulative percentages. This constitutes 50% of all words used
by the authors.
Here are a
selection of the longest common words (15 characters or more) derived from the
analysis. N.B. Proper and specialised names are not included. The
longest word, " uncharacteristically ", is only 20
characters in length and is used only ten times.
If you find this information
interesting, please let us know.
Note: This
condensing and editing has been done for analysis purposes only
and e-texts copied for normal use should not have their content
changed or parts removed.
|