Follow us on Facebook to receive important updates Follow us on Twitter to receive important updates Follow us on's microblogging site to receive important updates Follow us on Douban to receive important updates
Chinese Text Project

Digital humanities



The Chinese Text Project Application Programming Interface (CTP API) allows flexible access to large amounts of textual data and metadata from this site which can be used for text mining purposes and other digital humanities studies. It can also be used to integrate content from the site with external tools including Text Tools and MARKUS (described below).

The API is platform agnostic and can be used from any modern programming environment. Supported modules exist for Python designed to facilitate use in digital humanities teaching and research. A series of tutorials on the Digital Sinology site provide step-by-step instructions for accessing the API using the Python module.

Linked Open Data

Structured data created and maintained through the Data Wiki section of the site can be downloaded in bulk for research and analysis.

Text Tools

The Text Tools plugin for the Chinese Text Project provides powerful tools for analysis and visualization of the contents of Chinese texts, including automated analysis of text reuse, regular expressions, n-grams, and more. Some example visualizations are shown below; see the introductions to text reuse and regular expressions on the Digital Sinology site for detailed explanations of the examples, and the online tutorial on for step-by-step instructions.


The MARKUS platform, developed by Brent Ho and Hilde De Weerdt with funding from the European Research Council and Digging into Data, provides an online tool for automated and manual markup of named entities including personal names, place names, temporal references, and official titles in historical Chinese texts. A plugin for the Chinese Text Project allows texts from this site to be loaded directly into the MARKUS system.

Please note that MARKUS requires Google Chrome. Access to some API features requires an institutional subscription.


See also: conference papers on