About DEEDS Daters

Dater: Maximum Prevalence Method

The Maximum Prevalence Method for the dating of documents involves modelling the probability of occurrence of words across a time interval, in this case from 1066 to 1307. For each word contained in an undated document, the probability of its occurrence at different points in time is computed on the basis of the number of occurrences of the word relative to the total number of words found in the training documents from the specified points in time.


If you enter the phrase Sancte Marie in the "text pattern search", and click the "plot" button, you will be directed to the illustration below. The first panel is a graph (on a log scale) of the total number of two-worded phrases (in blue) relative to the total number of occurrences of the phrase Sancte Marie (in red) at each point in time (years), drawn in the lower horizontal axis. We use log scale so as to make visualization possible (otherwise, the relative occurrence of Sancte Marie to the total number of two-worded phrases would be too small to visualize graphically). The middle panel graphs the logit of the probability of occurrence of the phrase Sancte Marie. The years at which the relative frequency of Sancte Marie is high relative to the total frequency of all two-worded phrases, the logit of the probability of occurrence of Sancte Marie would also be high. The logit transform of the probability of occurrence of Sancte Marie is visually more appealing as it is not constrained between the numbers zero and one. The probability of occurrence of the phrase Sancte Marie (the value of which is constrained between the numbers zero and one) is illustrated in the third panel.

The default for the smoothing parameter value is 12. Low parameter values result in highly fluctuating probability curves, and high parameter value result in overly smoother, flat curves.

To date a document, the probability of occurrence of all the words of the document is aggregated to estimate the most likely point in time that the undated document would have been written. To date a document, click on the "Dater: Maximum Prevalence Method" and enter the text of the document. A date estimate for the document will be returned- and if followed by clicking "Plot probability of", the probability of occurrence of the document on a log scale for each point in time will be graphed.

The time point where the probability of occurrence of the document is at its highest is the estimated date of the document.

Dater: Distance Based Method

This method works by measuring the similarity of an undated document to a weighted set of previously dated training documents -- the higher the similarity between the undated document and a dated document from the training set, the higher the corresponding weight assigned to that date. An aggregate value of date-weight combination is used as an estimate of the date of the undated document. To date a document, click "Dater: Distance Based Method" and enter the text to be dated. After clicking "Date the text", the date estimate of the text is returned. As well, three documents most similar to the text of the undated document from the training database are listed.