Tools : Parallel passages - Chinese Text Project

Parallel passages

Introduction

The term "parallel passages" as used on this site refers to sections of writing in either the same or different texts, ranging in length from a few characters to extended paragraphs, that share common word use and order. They may be attributed (as when one author cites the Book of Poetry, and the transmitted version of the Book of Poetry contains an identical or near-identical line) or unattributed (as when much of the "On dyeing" chapter of the Mozi closely parallels the equivalent chapter of the Lüshi Chunqiu).

Scholars have long been aware of the presence of such parallel passages in early Chinese texts as well as their potential utility in better understanding early texts. Often small differences between otherwise similar passages provide clues as to the evolutionary history of a text over time, and can help in identifying and correcting errors in transmitted texts as well as in improving our understanding of the relationships between texts.

To aid in the study of parallel passages, the Chinese Text Project contains an integrated database of hundreds of thousands of such parallels. The vast majority of these have been identified by software developed for this site that performs an automated comparison of sections of text throughout the database. Currently the scope of the parallel passage feature includes all pre-Qin and Han texts, together with the main text (excluding commentary) of the following texts: Taiping Yulan, Qunshu Zhiyao, Shishuo Xinyu, Yanshi Jiaxun, Wenxin Diaolong, Baopuzi, Renwuzhi, Jinlouzi, Shuijingzhu, Shenxianzhuan, Sanguozhi, Gaoshizhuan, Yiwen Leiju, Yilin, and Taiping Guangji.

Getting started

Text reuse in Zhuangzi 1.1.

Text reuse in early Chinese texts is so extensive that in pre-Qin and Han texts a majority of passages incorporate some degree of parallelism with at least one other passage. Whenever a paragraph on the Chinese Text Project contains any parallels in the parallel passage database, the parallel passage () icon is displayed to the left of the passage; clicking on this displays the specific parallels identified.

At the top of the parallel passage display, the system provides a visual summary of text reuse within the selected paragraph. Parallel groups are indicated using shades of red; the stronger the shade of red a region is shaded, the more parallel groups it belongs to. Clicking on a shaded region jumps down to the corresponding list of parallels below.

Below the visual summary of reuse are groups of similar sections of text throughout the textual database. Within these, moving the mouse cursor over any one of the parallel passages causes the system to highlight in red those parts of that passage that are missing from the other versions, and highlight in green within those other versions all those characters which do not occur in the passage hovered over. To remove the highlights, double-click anywhere outside the set of parallel passages. As an example, in the parallel group titled 禮三本（一）, hovering over the passage from the Da Dai Liji highlights in red "禮有三本" because this phrase is missing from the Shiji version, as well as "生" which is recorded as "性" in the Xunzi version; hovering over the Xunzi passage only highlights one character, "則", in the Shiji version, because all the other characters present in the Shiji version agree with the Xunzi version.

Parallels between different chapters, texts, categories of text can also be searched using the Advanced search function.

Corpus scale analysis

The paper cited below also describes how text reuse can be visualized on a corpus scale, in terms of to what extent different works contain parallels to other works in the corpus. The image below presents a summary of reuse in the pre-Qin and Han corpus in particular, represented as a weighted network graph:

The summary data used to create this visualization can be downloaded as a GraphViz (.gv) file suitable for use with various network visualization tools including Gephi.

Credits

The parallel passage function required a considerable amount of time and effort to create - please appropriately acknowledge any use of this data in your research. The techniques used to create this data are described and evaluated in the following paper, which you may wish to cite when making use of the data:

Donald Sturgeon. 2017. Unsupervised Identification of Text Reuse in Early Chinese Literature. Digital Scholarship in the Humanities.

For identification of text reuse in texts not included in the study, you may also wish to try the Text Tools plugin for ctext, which provides easy to use tools for identifying and visualizing text reuse in arbitrary texts (including any texts available in ctext).