Text Analysis Software
Since 1999, in collaboration with the late Dr John Olsson, Seren Web developed a range of text tools to help in the analysis of texts including tools for word occurrence, comparing phrases in two separate texts and an analysis of percentage of words in common across texts.
These are free for you to use - please contact us for licensing versions with no limits on text sizes and with additional bespoke tools for textual analysis.
Mike Slater thetext.co.uk
Occurrence of Words
Occurence of words in a text based on word length
This software will analyse your text for occurrence of any word, lexical, non-lexical or both. You can specify the minimum length of the the word.
Phrases in Common
Phrases of six words in length between two texts, then five, four, three, two
This software will analyse your 2 texts for shared phrases of six words in length between the two texts. Then it will check for five word lengths, four, three and finally two.
Words in Common
Number of words and the number of instances of each word in common
This software analyses the number of words in common and the number of instances of each word in common between two texts.
About the Corpus
In linguistics, a corpus is a large and structured set of texts used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.
The Seren Corpus is a growing collection of articles taken from wikinews English language pages with an emphasis on the latest news items to reflect current use of language online in the English language.
The Corpus will be available again soon