Chinese Text Project | |
Simplified Chinese version |
Wiki style and formatting
Structure
Texts in the Wiki section of the site are composed of one or more sections. Each section has a title (e.g. "里仁"), a sequence number (e.g. "4"), and content (e.g. the contents of the 里仁 chapter of the Analects). Usually sections correspond to divisions in the original text - typically 篇 or 卷. Where the original text is very short, the Wiki text will have only a single section, with the same title as the text itself (e.g. "三字经"). Each section should contain all text and any subsections belonging to that section. For example:
Wiki text title: 论语 | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
...
|
Wiki text title: 三字经 | ||||||
---|---|---|---|---|---|---|
|
A single section can contain multiple subsections marked up as described below, provided that this does not make each section overly long (i.e. longer than easily displayed on a single web page):
Wiki text title: 玉堂丛语 | ||||||
---|---|---|---|---|---|---|
... |
Editions
In order to ensure that the Wiki is edited constructively and to ensure that repeated edits by different individuals improve the quality of the text, an edition which each Wiki text is to be based on should be specified. When editing a text, the Wiki version should always agree with the characters used in the base edition. If you believe that the base text itself is incorrect, and must be corrected to make it intelligible, please do not simply change the Wiki text so that it disagrees with the base text and agrees with your view of what the corrected text should say (or what another edition does say). Instead, please use the correction markers to indicate that the base text says something, but it should be emended to something else. For example:
If you are editing the Wiki text because it incorrectly disagrees with the base text (which you have checked in the library), please copy and paste the URL of the library page into the "Description" field if possible so that anyone reviewing your edit can easily see why you made the changes.
Punctuation
The following punctuation characters should be used in texts on the Wiki:
Characters | Usage |
---|---|
。 | Period. Used to separate sentences. |
, | Comma. Used to separate verbal phrases. |
、 | List marker. Used to separate noun phrases. |
! | Exclamation mark. |
? | Question mark. |
「 」 “ ” | Used to enclose quotations. Note that “ and ” are automatically normalized to 「 and 」 when an edit is submitted. |
『 』 ‘ ’ | Used to enclose quotations within other quotations. Note that ‘ and ’ are automatically normalized to 『 and 』 when an edit is submitted. |
【 】 | Used to indicate that the text contained is a categorization applying to the immediately following paragraph (e.g. "【疏】"). |
《 · 》 | Used to indicate that the text contained is a reference to a book or chapter title. Where composed of both a book and chapter title, "·" can be placed between the two. Only used when citing other books or chapters; chapter and section headings of the book being entered should not include these marks, as they will be added automatically. |
● | Used to indicate that the digital edition is missing one character that appears in the base text (e.g. because its Unicode representation is unknown or does not exist). |
□ | Used to indicate that the base text itself is missing one character, either because this is explicitly indicated in the base text in some way, or because the base text is physically damaged at this location. |
Characters with functional roles
Certain characters are used in the Wiki to specify particular functional roles of sections of text. These characters must only be used as described in the following table:
Characters | Usage | Example | Example displays as | ||
---|---|---|---|---|---|
* | Used at the start of a line, specifies that the line is a title at the highest level within the currently edited section. Note that this is not required for the title of the section itself, which must be entered in the "Title" field. | *记游 |
| ||
** | Used at the start of a line, specifies that the line is a title at the second-highest level within the currently edited section. | **记过合浦 |
| ||
{ } | Specifies that the characters between the { and } are printed in larger text. Typically used when a text is primarily a commentary to indicate sections of the original text. | {染于苍则苍,}《广雅释器》云:「苍、青也。」 |
| ||
{{ }} | Specifies that the characters between the {{ and }} are commentary and should be printed in smaller text. Typically used when a text is not primarily a commentary, but contains comments distinguished as such in the base text. | 《河图》曰:元气闓{{音开}}阳为天。 |
| ||
{{{ }}} | Specifies that the characters between the {{{ and }}} are marginal notes or textual remarks not occurring in the main body of the text. | 王,{{{王旧作命。改之}}}曰:乌宓,父师, |
| ||
[return] | Separates paragraphs. Paragraphs should be complete meaningful units of text, and should never end with the marks “ ‘ : , or 、. | ||||
| | Specifies that a line-break should be included where the mark occurs, but that this should not be treated as a paragraph break. | 关关雎鸠、在河之洲。|窈窕淑女、君子好逑。 |
| ||
●=Description= | Used to indicate that the digital edition is missing one character that appears in the base text, and provide a description of the character. | 山林谁问●=上「髟」下「丐」=萧萧。 |
| ||
【 】 | Specifies that the characters between 【 and 】 should be displayed inverted (i.e. white on black). Use only as indicated in the specified base text. | 【指归】 |
| ||
〖 〗 | Specifies that the characters between 〖 and 〗 should be displayed inside a solid border. Use only as indicated in the specified base text. | 〖指归〗 |
|
XML tags
The following tags are used to implement various additional functions. In most cases these are maintained automatically through normal editing functions; please do not create or modify these tags by hand except where absolutely necessary.
XML tag | Purpose |
---|---|
<scanbegin.../> | Aligning transcription and scan. |
<scanbreak.../> | Aligning transcription and scan. |
<scanend.../> | Aligning transcription and scan. |
<picture.../> | Including an illustration in a text. |
<character.../> | Including a non-Unicode character in a text. |
<entity...>...</entity> | Marking a named entity (e.g. person name) in a text. |
Uploading new resources from scratch
When uploading a completely new text using the upload page, regardless of the length of the text or its sections, you should enter the text using heading markers as if the entire text were composed of a single section. The system will automatically split it into sections according to the title markers entered; the highest-level markers are treated as section titles. So for instance, if uploading the full text of the Analects, the data to be input would be:
子曰:「学而时习之,不亦说乎?...
...
子曰:「不患人之不己知,患不知人也。」
*为政
子曰:「为政以德,譬如北辰,居其所而众星共之。」...
...
*尧曰
尧曰:「咨!尔舜!天之历数在尔躬。...
...
This would create a new text composed of twenty sections, one for each chapter of the Analects.
Metadata
Textual resources in the Wiki section of the site incorporate metadata describing the text, accessible via the "Edit" link from the contents page of the text. When making changes to this metadata, please pay attention to the following aspects in particular:
Tags
The "Tags" field may include data describing how the text should be treated within the Chinese Text Project system. The tags field should not contain any data not described in the table below. If more than one tag is present, tags should be delimited using ",".
Tag | Meaning |
---|---|
TEXTDB | The text is a resource from the textual database (as opposed to the Wiki). No Wiki resource should have this tag present. |
WORKSET(urn) | This text represents an instance of the same abstract work as the text specified by the URN urn. urn must be different from the URN for the text to which this tag applies. The text referenced by urn should be the default representative text of this work. |
OCR_PRIMARY | The text is an OCR-derived transcription, and no proofread edition of the same text was available when it was transcribed. |
OCR_SECONDARY(textid) | The text is an OCR-derived transcription, and another edition of the same text (specified by textid) was available when it was transcribed. |
OCR_MATCH | The text results from a proofread transcription that has been automatically matched to the corresponding scanned text. |
OCR_FAILEDMATCH(textid) | The text is an OCR-derived transcription, and another edition of the same text (specified by textid) was available when it was transcribed. This textual transcription was created because it was not possible to match the existing transcription to the scanned text. |
OCR_CORRECTED | The text is an OCR-derived transcription, however its contents have now been substantially proofread and corrected. |
OCR_CORRECTED(nn) | The text is an OCR-derived transcription, however approximately nn% of its contents have been proofread and corrected. |
REDIRECT(urn) | The text has been replaced with or merged into the text identified by the URN urn. |
FORK(urn) | This text was originally created as a copy of the text identified by the URN urn. |
PUNCTUATED | The text has modern punctuation as described in this document throughout. |
PUNCTUATED_OLD | The text has old-style punctuation throughout. |
ANNOTATED | The text contains at least one semantic annotation. |
Composition date
The "Composition date" field specifies the dates of authorship of a text. This can either be a single number representing the year in which the text was composed, or a range of years expressed in the format "firstyear~lastyear". In the latter case, this represents either uncertainty as to when precisely the text was composed, or knowledge that the text was composed over an extended time period contained within the two specified years. When changing this field, cite the source for your change in the "Description of edit" field.
Texts, works, and worksets
In the Chinese Text Project system and documentation:
- A "text" is a textual transcription based upon one particular edition of some piece of writing (e.g ctp:wb250388, a transcription based on the 钦定四库全书 edition of the 尚书全解).
- A "work" is an abstraction of a text independent of its realization in any particular edition (e.g. "the 尚书全解", as expressed in such texts as ctp:wb250388 and others).
- A "workset" is a set of texts all of which correspond to the same work (e.g. those texts listed under ctp:work:wb250388).
- A "representative text" is one text from a workset that can be considered representative of the work expressed by texts in the workset. This should normally be the highest quality text of the workset, as determined by aspects such as completeness, accuracy of transcription, alignment to the base text, punctuation, etc.
Texts on the Chinese Text Project are the textual resources most commonly interacted with, e.g. in the textual database and wiki. In many cases, the system includes more than one text corresponding to a single work. In order to record and manage the relationships between texts and works, the "workset" concept is applied; this is implemented technically by use of the WORKSET tag. Only texts which are similar should be grouped together using WORKSET tags. Texts are similar in the relevant sense if and only if their contents are substantially similar. Commentary is considered part of the text. Therefore:
- A text without commentary is a distinct work from another text which contains the contents of the first text together with a substantial amount of commentary.
- Two different commentaries on the same text will typically be two distinct works.
- Where commentary is sparse or insubstantial, and especially where it is also unattributed, a commented text may be considered similar to an uncommented or differently commented text.
- Two texts may have different titles yet still represent the same work.
- Two texts may have the same title and author, yet still represent two distinct works.
When adding WORKSET tags, please note that:
- Every text belongs to one and only one workset.
- If no WORKSET tag is present, a text belongs to its own workset, implicitly derived from its URN. For example, ctp:wb153836 (based on the 钦定四库全书 edition of the 道德指归论) has no WORKSET tag, and therefore belongs to the workset ctp:work:wb153836. A text with no WORKSET tag is considered the representative text of its workset. It follows that representative texts never have a WORKSET tag.
- A WORKSET tag should never specify the URN of the text to which it applies (e.g. ctp:wb153836 should not have the tag "WORKSET(ctp:wb153836)").
- A WORKSET(urn) tag specifies that a text is an instance of the same work as the text specified by urn. The text specified by urn should be the representative text of the work, and therefore should not itself have any WORKSET tag.