Follow us on Facebook to receive important updates Follow us on Twitter to receive important updates Follow us on sina.com's microblogging site to receive important updates Follow us on Douban to receive important updates
Chinese Text Project

Wiki style and formatting

Structure

Texts in the Wiki section of the site are composed of one or more sections. Each section has a title (e.g. "里仁"), a sequence number (e.g. "4"), and content (e.g. the contents of the 里仁 chapter of the Analects). Usually sections correspond to divisions in the original text - typically 篇 or 卷. Where the original text is very short, the Wiki text will have only a single section, with the same title as the text itself (e.g. "三字經"). Each section should contain all text and any subsections belonging to that section. For example:

Wiki text title: 論語
Section title:學而
Sequence:1
Contents: 子曰:「學而時習之,不亦說乎?...
...
子曰:「不患人之不己知,患不知人也。」
Section title:為政
Sequence:2
Contents:子曰:「為政以德,譬如北辰,居其所而眾星共之。」...

...

Section title:堯曰
Sequence:20
Contents:堯曰:「咨!爾舜!天之曆數在爾躬。...
Wiki text title: 三字經
Section title:三字經
Sequence:1
Contents:人之初,性本善。性相近,習相遠。

苟不教,性乃遷。教之道,貴以專。

...

勤有功,戲無益。戒之哉,宜勉力。

A single section can contain multiple subsections marked up as described below, provided that this does not make each section overly long (i.e. longer than easily displayed on a single web page):

Wiki text title: 玉堂叢語
Section title:卷之一
Sequence:1
Contents:*行誼

贊善大夫龍泉章公溢,始生,其音如鐘,...

...蓋三歸而修創利物業三焉。

*文學

申屠衡,長洲人。幼學於楊維楨,明春秋,肆力古文。...

*言語

國初郊祝文有予、我字,上怒,將罪作者。...

...

Editions

In order to ensure that the Wiki is edited constructively and to ensure that repeated edits by different individuals improve the quality of the text, an edition which each Wiki text is to be based on should be specified. When editing a text, the Wiki version should always agree with the characters used in the base edition. If you believe that the base text itself is incorrect, and must be corrected to make it intelligible, please do not simply change the Wiki text so that it disagrees with the base text and agrees with your view of what the corrected text should say (or what another edition does say). Instead, please use the correction markers to indicate that the base text says something, but it should be emended to something else. For example:

Correction typeUsageExampleExample displays as
Deletion...(text in base text to be deleted)=explanation for change=...待周不乘馬而後不乘馬(而後不乘馬)=衍文。=。
1 待周不乘馬而後不乘馬而後不乘馬衍文。
Insertion...[text not in base text to be inserted]=explanation for change=...古之民,未知為宮[室]=據孫詒讓《墨子閒詁》補。=時,
1 古之民,未知為宮據孫詒讓《墨子閒詁》補。時,
Correction...(text in base text to be altered)[new text]=explanation for change=...故(食)[倉]=據孫詒讓《墨子閒詁》改。=無備粟,
1 據孫詒讓《墨子閒詁》改。無備粟,

If you are editing the Wiki text because it incorrectly disagrees with the base text (which you have checked in the library), please copy and paste the URL of the library page into the "Description" field if possible so that anyone reviewing your edit can easily see why you made the changes.

Punctuation

The following punctuation characters should be used in texts on the Wiki:

CharactersUsage
Period. Used to separate sentences.
Comma. Used to separate verbal phrases.
List marker. Used to separate noun phrases.
Exclamation mark.
Question mark.
「 」 “ ” Used to enclose quotations. Note that “ and ” are automatically normalized to 「 and 」 when an edit is submitted.
『 』 ‘ ’ Used to enclose quotations within other quotations. Note that ‘ and ’ are automatically normalized to 『 and 』 when an edit is submitted.
【 】Used to indicate that the text contained is a categorization applying to the immediately following paragraph (e.g. "【疏】").
《 · 》Used to indicate that the text contained is a reference to a book or chapter title. Where composed of both a book and chapter title, "·" can be placed between the two. Only used when citing other books or chapters; chapter and section headings of the book being entered should not include these marks, as they will be added automatically.
Used to indicate that the digital edition is missing one character that appears in the base text (e.g. because its Unicode representation is unknown or does not exist).
Used to indicate that the base text itself is missing one character, either because this is explicitly indicated in the base text in some way, or because the base text is physically damaged at this location.
No other punctuation marks should be used, except where they appear in the base text (e.g. ○ or ◊), or have functional roles listed below. In particular, English quotation marks (i.e. the "half-width" quotation marks " and ') and other punctuation marks (e.g. , . ; :) should never be used.

Characters with functional roles

Certain characters are used in the Wiki to specify particular functional roles of sections of text. These characters must only be used as described in the following table:

CharactersUsageExampleExample displays as
*Used at the start of a line, specifies that the line is a title at the highest level within the currently edited section. Note that this is not required for the title of the section itself, which must be entered in the "Title" field.*記遊

記遊》

**Used at the start of a line, specifies that the line is a title at the second-highest level within the currently edited section.**記過合浦

·記過合浦》

{ }Specifies that the characters between the { and } are printed in larger text. Typically used when a text is primarily a commentary to indicate sections of the original text.{染於蒼則蒼,}《廣雅釋器》云:「蒼、青也。」
1 染於蒼則蒼,《廣雅釋器》云:「蒼、青也。」
{{ }}Specifies that the characters between the {{ and }} are commentary and should be printed in smaller text. Typically used when a text is not primarily a commentary, but contains comments distinguished as such in the base text.《河圖》曰:元氣闓{{音開}}陽為天。
1 《河圖》曰:元氣闓音開陽為天。
{{{ }}}Specifies that the characters between the {{{ and }}} are marginal notes or textual remarks not occurring in the main body of the text.王,{{{王舊作命。改之}}}曰:烏虖,父師,
1 王,王舊作命。改之曰:烏虖,父師,
[return]Separates paragraphs. Paragraphs should be complete meaningful units of text, and should never end with the marks 「 『 : , or 、.
|Specifies that a line-break should be included where the mark occurs, but that this should not be treated as a paragraph break.關關雎鳩、在河之洲。|窈窕淑女、君子好逑。
1 關關雎鳩、在河之洲。
窈窕淑女、君子好逑。
●=DescriptionUsed to indicate that the digital edition is missing one character that appears in the base text, and provide a description of the character.山林誰問●=上「髟」下「丐」=蕭蕭。
1 山林誰問●缺字:上「髟」下「丐」蕭蕭。
【 】Specifies that the characters between 【 and 】 should be displayed inverted (i.e. white on black). Use only as indicated in the specified base text.【指歸】
1 指歸
〖 〗Specifies that the characters between 〖 and 〗 should be displayed inside a solid border. Use only as indicated in the specified base text.〖指歸〗
1 指歸

XML tags

The following tags are used to implement various additional functions. In most cases these are maintained automatically through normal editing functions; please do not create or modify these tags by hand except where absolutely necessary.

XML tagPurpose
<scanbegin.../>Aligning transcription and scan.
<scanbreak.../>Aligning transcription and scan.
<scanend.../>Aligning transcription and scan.
<picture.../>Including an illustration in a text.
<character.../>Including a non-Unicode character in a text.
<entity...>...</entity>Marking a named entity (e.g. person name) in a text.
No other HTML or XML tags should be added to any texts.

Uploading new resources from scratch

When uploading a completely new text using the upload page, regardless of the length of the text or its sections, you should enter the text using heading markers as if the entire text were composed of a single section. The system will automatically split it into sections according to the title markers entered; the highest-level markers are treated as section titles. So for instance, if uploading the full text of the Analects, the data to be input would be:

*學而
子曰:「學而時習之,不亦說乎?...
...
子曰:「不患人之不己知,患不知人也。」

*為政

子曰:「為政以德,譬如北辰,居其所而眾星共之。」...
...

*堯曰

堯曰:「咨!爾舜!天之曆數在爾躬。...

...

This would create a new text composed of twenty sections, one for each chapter of the Analects.

Metadata

Textual resources in the Wiki section of the site incorporate metadata describing the text, accessible via the "Edit" link from the contents page of the text. When making changes to this metadata, please pay attention to the following aspects in particular:

Tags

The "Tags" field may include data describing how the text should be treated within the Chinese Text Project system. The tags field should not contain any data not described in the table below. If more than one tag is present, tags should be delimited using ",".

TagMeaning
TEXTDBThe text is a resource from the textual database (as opposed to the Wiki). No Wiki resource should have this tag present.
WORKSET(urn)This text represents an instance of the same abstract work as the text specified by the URN urn. urn must be different from the URN for the text to which this tag applies. The text referenced by urn should be the default representative text of this work.
OCR_PRIMARYThe text is an OCR-derived transcription, and no proofread edition of the same text was available when it was transcribed.
OCR_SECONDARY(textid)The text is an OCR-derived transcription, and another edition of the same text (specified by textid) was available when it was transcribed.
OCR_MATCHThe text results from a proofread transcription that has been automatically matched to the corresponding scanned text.
OCR_FAILEDMATCH(textid)The text is an OCR-derived transcription, and another edition of the same text (specified by textid) was available when it was transcribed. This textual transcription was created because it was not possible to match the existing transcription to the scanned text.
OCR_CORRECTEDThe text is an OCR-derived transcription, however its contents have now been substantially proofread and corrected.
OCR_CORRECTED(nn)The text is an OCR-derived transcription, however approximately nn% of its contents have been proofread and corrected.
REDIRECT(urn)The text has been replaced with or merged into the text identified by the URN urn.
FORK(urn)This text was originally created as a copy of the text identified by the URN urn.
PUNCTUATEDThe text has modern punctuation as described in this document throughout.
PUNCTUATED_OLDThe text has old-style punctuation throughout.
ANNOTATEDThe text contains at least one semantic annotation.

Composition date

The "Composition date" field specifies the dates of authorship of a text. This can either be a single number representing the year in which the text was composed, or a range of years expressed in the format "firstyear~lastyear". In the latter case, this represents either uncertainty as to when precisely the text was composed, or knowledge that the text was composed over an extended time period contained within the two specified years. When changing this field, cite the source for your change in the "Description of edit" field.

Texts, works, and worksets

In the Chinese Text Project system and documentation:

Texts on the Chinese Text Project are the textual resources most commonly interacted with, e.g. in the textual database and wiki. In many cases, the system includes more than one text corresponding to a single work. In order to record and manage the relationships between texts and works, the "workset" concept is applied; this is implemented technically by use of the WORKSET tag. Only texts which are similar should be grouped together using WORKSET tags. Texts are similar in the relevant sense if and only if their contents are substantially similar. Commentary is considered part of the text. Therefore:

Also note that similarity is defined in terms of substantially similar content. Therefore:

When adding WORKSET tags, please note that: