Chinese Text Project |
Digital humanities
Tools
CTP API
The Chinese Text Project Application Programming Interface (CTP API) allows flexible access to large amounts of textual data and metadata from this site which can be used for text mining purposes and other digital humanities studies. It can also be used to integrate content from the site with external tools including Text Tools and MARKUS (described below).
The API is platform agnostic and can be used from any modern programming environment. Supported modules exist for Python designed to facilitate use in digital humanities teaching and research. A series of tutorials on the Digital Sinology site provide step-by-step instructions for accessing the API using the Python module.
Linked Open Data
Structured data created and maintained through the Data Wiki section of the site can be downloaded in bulk for research and analysis.
Text Tools
The Text Tools plugin for the Chinese Text Project provides powerful tools for analysis and visualization of the contents of Chinese texts, including automated analysis of text reuse, regular expressions, n-grams, and more. Some example visualizations are shown below; see the introductions to text reuse and regular expressions on the Digital Sinology site for detailed explanations of the examples, and the online tutorial on dsturgeon.net for step-by-step instructions.
MARKUS
The MARKUS platform, developed by Brent Ho and Hilde De Weerdt with funding from the European Research Council and Digging into Data, provides an online tool for automated and manual markup of named entities including personal names, place names, temporal references, and official titles in historical Chinese texts. A plugin for the Chinese Text Project allows texts from this site to be loaded directly into the MARKUS system.
Please note that MARKUS requires Google Chrome. Access to some API features requires an institutional subscription.
Research
- Sturgeon, D. Unsupervised Identification of Text Reuse in Early Chinese Literature, Digital Scholarship in the Humanities, 2017.
- Sturgeon, D. Unsupervised Extraction of Training Data for pre-Modern Chinese OCR. 30th International Florida Artificial Intelligence Research Society [FLAIRS] Conference, 2017.
See also: conference papers on dsturgeon.net.