Instructions : Wiki style and formatting

Wiki style and formatting

Structure

Texts in the Wiki section of the site are composed of one or more sections. Each section has a title (e.g. "里仁"), a sequence number (e.g. "4"), and content (e.g. the contents of the 里仁 chapter of the Analects). Usually sections correspond to divisions in the original text - typically 篇 or 卷. Where the original text is very short, the Wiki text will have only a single section, with the same title as the text itself (e.g. "三字經"). Each section should contain all text and any subsections belonging to that section. For example:

Wiki text title: 論語

Section title:	學而
Sequence:	1
Contents:	子曰：「學而時習之，不亦說乎？... ... 子曰：「不患人之不己知，患不知人也。」

Section title:	為政
Sequence:	2
Contents:	子曰：「為政以德，譬如北辰，居其所而眾星共之。」...

...

Section title:	堯曰
Sequence:	20
Contents:	堯曰：「咨！爾舜！天之曆數在爾躬。...

Wiki text title: 三字經

Section title:	三字經
Sequence:	1
Contents:	人之初，性本善。性相近，習相遠。苟不教，性乃遷。教之道，貴以專。 ... 勤有功，戲無益。戒之哉，宜勉力。

A single section can contain multiple subsections marked up as described below, provided that this does not make each section overly long (i.e. longer than easily displayed on a single web page):

Wiki text title: 玉堂叢語

Section title:	卷之一
Sequence:	1
Contents:	行誼贊善大夫龍泉章公溢，始生，其音如鐘，... ...蓋三歸而修創利物業三焉。文學申屠衡，長洲人。幼學於楊維楨，明春秋，肆力古文。... *言語國初郊祝文有予、我字，上怒，將罪作者。...

...

Editions

In order to ensure that the Wiki is edited constructively and to ensure that repeated edits by different individuals improve the quality of the text, an edition which each Wiki text is to be based on should be specified. When editing a text, the Wiki version should always agree with the characters used in the base edition. If you believe that the base text itself is incorrect, and must be corrected to make it intelligible, please do not simply change the Wiki text so that it disagrees with the base text and agrees with your view of what the corrected text should say (or what another edition does say). Instead, please use the correction markers to indicate that the base text says something, but it should be emended to something else. For example:

Correction type

Usage

Example

Example displays as

Deletion

...（text in base text to be deleted）＝explanation for change＝...

待周不乘馬而後不乘馬（而後不乘馬）＝衍文。＝。

1	待周不乘馬而後不乘馬而後不乘馬衍文。。

Insertion

...［text not in base text to be inserted］＝explanation for change＝...

古之民，未知為宮［室］＝據孫詒讓《墨子閒詁》補。＝時，

1	古之民，未知為宮室據孫詒讓《墨子閒詁》補。時，

Correction

...（text in base text to be altered）［new text］＝explanation for change＝...

故（食）［倉］＝據孫詒讓《墨子閒詁》改。＝無備粟，

1	故食倉據孫詒讓《墨子閒詁》改。無備粟，

If you are editing the Wiki text because it incorrectly disagrees with the base text (which you have checked in the library), please copy and paste the URL of the library page into the "Description" field if possible so that anyone reviewing your edit can easily see why you made the changes.

Punctuation

The following punctuation characters should be used in texts on the Wiki:

Characters	Usage
。	Period. Used to separate sentences.
，	Comma. Used to separate verbal phrases.
、	List marker. Used to separate noun phrases.
！	Exclamation mark.
？	Question mark.
「」 “ ”	Used to enclose quotations. Note that “ and ” are automatically normalized to 「 and 」 when an edit is submitted.
『』 ‘ ’	Used to enclose quotations within other quotations. Note that ‘ and ’ are automatically normalized to 『 and 』 when an edit is submitted.
【】	Used to indicate that the text contained is a categorization applying to the immediately following paragraph (e.g. "【疏】").
《 · 》	Used to indicate that the text contained is a reference to a book or chapter title. Where composed of both a book and chapter title, "·" can be placed between the two. Only used when citing other books or chapters; chapter and section headings of the book being entered should not include these marks, as they will be added automatically.
●	Used to indicate that the digital edition is missing one character that appears in the base text (e.g. because its Unicode representation is unknown or does not exist).
□	Used to indicate that the base text itself is missing one character, either because this is explicitly indicated in the base text in some way, or because the base text is physically damaged at this location.

No other punctuation marks should be used, except where they appear in the base text (e.g. ○ or ◊), or have functional roles listed below. In particular, English quotation marks (i.e. the "half-width" quotation marks " and ') and other punctuation marks (e.g. , . ; :) should never be used.

Characters with functional roles

Certain characters are used in the Wiki to specify particular functional roles of sections of text. These characters must only be used as described in the following table:

Characters

Usage

Example

Example displays as

Used at the start of a line, specifies that the line is a title at the highest level within the currently edited section. Note that this is not required for the title of the section itself, which must be entered in the "Title" field.

*記遊

《記遊》

Used at the start of a line, specifies that the line is a title at the second-highest level within the currently edited section.

**記過合浦

《·記過合浦》

{ }

Specifies that the characters between the { and } are printed in larger text. Typically used when a text is primarily a commentary to indicate sections of the original text.

{染於蒼則蒼，}《廣雅釋器》云：「蒼、青也。」

1	染於蒼則蒼，《廣雅釋器》云：「蒼、青也。」

Specifies that the characters between the {{ and }} are commentary and should be printed in smaller text. Typically used when a text is not primarily a commentary, but contains comments distinguished as such in the base text.

《河圖》曰：元氣闓{{音開}}陽為天。

1	《河圖》曰：元氣闓音開陽為天。

{{{ }}}

Specifies that the characters between the {{{ and }}} are marginal notes or textual remarks not occurring in the main body of the text.

王，{{{王舊作命。改之}}}曰：烏虖，父師，

1	王，王舊作命。改之曰：烏虖，父師，

[return]

Separates paragraphs. Paragraphs should be complete meaningful units of text, and should never end with the marks 「『：， or 、.

Specifies that a line-break should be included where the mark occurs, but that this should not be treated as a paragraph break.

關關雎鳩、在河之洲。|窈窕淑女、君子好逑。

1	關關雎鳩、在河之洲。窈窕淑女、君子好逑。

●＝Description＝

Used to indicate that the digital edition is missing one character that appears in the base text, and provide a description of the character.

山林誰問●＝上「髟」下「丐」＝蕭蕭。

1	山林誰問●缺字：上「髟」下「丐」蕭蕭。

【】

Specifies that the characters between 【 and 】 should be displayed inverted (i.e. white on black). Use only as indicated in the specified base text.

【指歸】

指歸

〖〗

Specifies that the characters between 〖 and 〗 should be displayed inside a solid border. Use only as indicated in the specified base text.

〖指歸〗

指歸

XML tags

The following tags are used to implement various additional functions. In most cases these are maintained automatically through normal editing functions; please do not create or modify these tags by hand except where absolutely necessary.

XML tag	Purpose
<scanbegin.../>	Aligning transcription and scan.
<scanbreak.../>	Aligning transcription and scan.
<scanend.../>	Aligning transcription and scan.
<picture.../>	Including an illustration in a text.
<character.../>	Including a non-Unicode character in a text.
<entity...>...</entity>	Marking a named entity (e.g. person name) in a text.

No other HTML or XML tags should be added to any texts.

Uploading new resources from scratch

When uploading a completely new text using the upload page, regardless of the length of the text or its sections, you should enter the text using heading markers as if the entire text were composed of a single section. The system will automatically split it into sections according to the title markers entered; the highest-level markers are treated as section titles. So for instance, if uploading the full text of the Analects, the data to be input would be:

*學而
子曰：「學而時習之，不亦說乎？...
...
子曰：「不患人之不己知，患不知人也。」

*為政

子曰：「為政以德，譬如北辰，居其所而眾星共之。」...
...

*堯曰

堯曰：「咨！爾舜！天之曆數在爾躬。...

...

This would create a new text composed of twenty sections, one for each chapter of the Analects.

Metadata

Textual resources in the Wiki section of the site incorporate metadata describing the text, accessible via the "Edit" link from the contents page of the text. When making changes to this metadata, please pay attention to the following aspects in particular:

Tag	Meaning
TEXTDB	The text is a resource from the textual database (as opposed to the Wiki). No Wiki resource should have this tag present.
WORKSET(urn)	This text represents an instance of the same abstract work as the text specified by the URN urn. urn must be different from the URN for the text to which this tag applies. The text referenced by urn should be the default representative text of this work.
OCR_PRIMARY	The text is an OCR-derived transcription, and no proofread edition of the same text was available when it was transcribed.
OCR_SECONDARY(textid)	The text is an OCR-derived transcription, and another edition of the same text (specified by textid) was available when it was transcribed.
OCR_MATCH	The text results from a proofread transcription that has been automatically matched to the corresponding scanned text.
OCR_FAILEDMATCH(textid)	The text is an OCR-derived transcription, and another edition of the same text (specified by textid) was available when it was transcribed. This textual transcription was created because it was not possible to match the existing transcription to the scanned text.
OCR_CORRECTED	The text is an OCR-derived transcription, however its contents have now been substantially proofread and corrected.
OCR_CORRECTED(nn)	The text is an OCR-derived transcription, however approximately nn% of its contents have been proofread and corrected.
REDIRECT(urn)	The text has been replaced with or merged into the text identified by the URN urn.
FORK(urn)	This text was originally created as a copy of the text identified by the URN urn.
PUNCTUATED	The text has modern punctuation as described in this document throughout.
PUNCTUATED_OLD	The text has old-style punctuation throughout.
ANNOTATED	The text contains at least one semantic annotation.

Composition date

The "Composition date" field specifies the dates of authorship of a text. This can either be a single number representing the year in which the text was composed, or a range of years expressed in the format "firstyear~lastyear". In the latter case, this represents either uncertainty as to when precisely the text was composed, or knowledge that the text was composed over an extended time period contained within the two specified years. When changing this field, cite the source for your change in the "Description of edit" field.

Texts, works, and worksets

In the Chinese Text Project system and documentation:

A "text" is a textual transcription based upon one particular edition of some piece of writing (e.g ctp:wb250388, a transcription based on the 欽定四庫全書 edition of the 尚書全解).
A "work" is an abstraction of a text independent of its realization in any particular edition (e.g. "the 尚書全解", as expressed in such texts as ctp:wb250388 and others).
A "workset" is a set of texts all of which correspond to the same work (e.g. those texts listed under ctp:work:wb250388).
A "representative text" is one text from a workset that can be considered representative of the work expressed by texts in the workset. This should normally be the highest quality text of the workset, as determined by aspects such as completeness, accuracy of transcription, alignment to the base text, punctuation, etc.

Texts on the Chinese Text Project are the textual resources most commonly interacted with, e.g. in the textual database and wiki. In many cases, the system includes more than one text corresponding to a single work. In order to record and manage the relationships between texts and works, the "workset" concept is applied; this is implemented technically by use of the WORKSET tag. Only texts which are similar should be grouped together using WORKSET tags. Texts are similar in the relevant sense if and only if their contents are substantially similar. Commentary is considered part of the text. Therefore:

A text without commentary is a distinct work from another text which contains the contents of the first text together with a substantial amount of commentary.
Two different commentaries on the same text will typically be two distinct works.
Where commentary is sparse or insubstantial, and especially where it is also unattributed, a commented text may be considered similar to an uncommented or differently commented text.

Also note that similarity is defined in terms of substantially similar content. Therefore:

Two texts may have different titles yet still represent the same work.
Two texts may have the same title and author, yet still represent two distinct works.

When adding WORKSET tags, please note that:

Every text belongs to one and only one workset.
If no WORKSET tag is present, a text belongs to its own workset, implicitly derived from its URN. For example, ctp:wb153836 (based on the 欽定四庫全書 edition of the 道德指歸論) has no WORKSET tag, and therefore belongs to the workset ctp:work:wb153836. A text with no WORKSET tag is considered the representative text of its workset. It follows that representative texts never have a WORKSET tag.
A WORKSET tag should never specify the URN of the text to which it applies (e.g. ctp:wb153836 should not have the tag "WORKSET(ctp:wb153836)").
A WORKSET(urn) tag specifies that a text is an instance of the same work as the text specified by urn. The text specified by urn should be the representative text of the work, and therefore should not itself have any WORKSET tag.