Chinese Telegraph Code (CTC)

The Chinese telegraphic code book, compiled in 1911, was used for sending Chinese characters over the electrical (hand-keyed, Morse) telegraph. For each of approximately 9800 characters (arranged in dictionary order) a 4 digit code number was assigned. The code is presented in the form of 100 one page charts, each with a 10 by 10 grid of cells, each cell containing a Chinese character and its 4 digit code number. The page number matches the first two code number digits, the row number matches the third code digit, and the column number matches the fourth code digit. This code is variously called ``dian4bao4ma3,'' ``Standard Telegraphic Code,'' ``Chinese Commercial Telegraph Code,'' ``Chinese Commercial Code,'' and ``telecode,'' and I have seen it abbreviated STC, CCC, and CTC.

One informant tells me that the term ``Standard'' refers specifically to the PRC version of the code.

It is not the same thing as the ``quwei'' system of row and column numbers, each in the range 1 through 94.

In an article ``Danish Watchmaker Created the Chinese Morse System'', Morsum Magnificat, April 1997, Kurt Jacobsen describes the introduction of the telegraph to China in 1871 by the Great Northern Telegraph Company, a Danish firm, along with a code book. The notion of using numbers to stand for characters arranged in radical/stroke-count dictionary order was due to Hans Schjellerup, whose book contained 5454 characters. The idea of arranging the characters in a grid was due to M. S. A. Viguier, who was at the same time working on a rival code book. Schjellerup's 1871 code book, according to Jacobsen, evolved through various editions into the modern CTC. A copy of this book is in the Danish National Archives; it would be interesting to see how closely it matches 20th century editions of the CTC. The same article also shows part of a page of an 1890's GNTC code book, clearly showing number/character mappings identical with those in mid 20th century edtions of the CTC.

When the electrical telegraph fell out of use, CTC did too. (But one informant tells me it is used by ham radio operators in Taiwan.) Similar schemes mapping characters to numbers (notably GB2312-80 and ``Big 5'') are used nowadays in Asian language text processing computer applications.

Versions of the CTC on the Web

I found many copies of a web page telecode.gb (viewable if GB is enabled in your browser) establishing a partial mapping between CTC and GB. Using this data and the gb2ps software of W. Sun (william@cs.anu.edu.au), I made a Postscript approximation to a printed copy of the Chinese telegraph code book. (Warning: I don't know a word of Chinese, and have very little way of checking what I have done. I know there are some gaps in the code, that is, 4 digit CTC code numbers which in fact do stand for characters but are not given in telecode.gb. Could this be because those characters are in the traditional character lists but not in the GB2312-80 list of simplified characters?)

As a test, what does 7193 1032 4316 mean?

The Unicode consortium maintains a file Unihan.txt listing both ``PRC telegraph code'' and ``Taiwan telegraph code'' numbers. These are not identical with each other, nor with the values in the ``GB'' version discussed above. Using this Unicode-derived data I made Postscript ``reconstructions'' of PRC and Taiwan CTC books, showing only those characters in my computer's CNC 11643 font. Thomas Chan at at Ohio State University compiled html versions, viewable if your browser has the right encodings enabled: PRC version in GB coding and the Taiwan version in Big 5 coding.

The Unicode tabulation, according to a personal communication from Unicode expert Lee Collins, was compiled from a Japanese source, Kanzi denpou koudo henkan-hyou [=Chinese character telegraph code conversion table], Lin Jinyi, KEC (KDD Engineering and Consulting), Tokyo, 1984. That work in turn cites Biaozhun dianma-ben (xiuding-ban) of March 1983 for the mainland version and Dianma xinbian of January 1976 for the Taiwan version. Collins reports a different source as specifying the publisher of the Taiwan version as ``Dianxin-ju (Telecommunications Office).''

Another version of the CTC is available on the web, courtesy John Savard. It differs even more from the above 3 versions. Unfortunately, Savard has not been to supply any precise source for his data.

Printed Versions of the CTC

The CTC has remained in print pretty much all through the 20th century. Library catalogues list many different editions. I do not know if it is still in print. (I own only a copy of one odd-ball edition, titled Korean Telegraphic Code Book, with characters arranged by sounds in English alphabetic order according to the McCune Reischauer system of transliteration, published in the 1950's. It has two sections. The first is arranged in pronunciation order, showing the Chinese character, code number, and Korean pronunciation. The second section (the ``decoding section'') is listed in numerical order, showing code number and pronunciation, but not the character. So it's hard to use this book to check any of the electronic versions of the CTC mentioned above.)

One page of an unnamed edition of the CTC is reproduced on p.138 of Alan Stripp's Code Breaker in the Far East. It shows, in addition to the 4 digit code numbers, three letter code trigraphs. These same trigraph assigments also appear in my Korean CTC. (Savard's web page also shows code trigraphs, but his disagree with those in my Korean CTC, but agree with those in an undated Hong Kong edition of the CTC (Tsui-hsin piao-chun tien-ma-pen [= Latest standard telegraphic code]) I saw in a library.) Trigraph based commercial telegraphic codes became popular in the 1920's and 1930's, when they were typically sent concatenated: three trigraphs plus one check letter to make one 10-letter ``word''. This would have represented a big saving over sending the corresponding 12 digits. So my guess is, that the trigraphs were added in that period, especially for use over overseas cables, whose rate structure discouraged the use of number codes. (The Hong Kong edition mentioned above shows an example of this, without check letters: coding a sequence of five characters as five trigraphs which are then re-chunked into three five-letter groups.)

There was, evidently, some variation between various editions of the CTC. This is not unexpected, given the very long time this code was in use and given the Chinese political situation. Collins, summarizing from the preface to his Japanese source, reports that KDD published it to standardize interchange of telegrams between Japan, China, and Taiwan, since the two Chinese standards had diverged as a result of the script simplification on the mainland and the addition of characters. The earlier standard defined characters up to code point 7902 in traditional Kangxi radical-stroke order. After the revolution the standard diverged. Code points above 7902 were added, incompatibly, in both versions. Below 7902, the PRC eliminated a number of characters or replaced them with simplified glyphs. The general ordering of the PRC version did not change, although some of the simplified characters are now out of order since their stroke count changed.

I would very much like to know the history of editions, versions, revisions of the CTC. (Both the standard bibliographic facts: publishers, places and dates of publication, reprints and reissues, but also the coding facts: changes to the code, date of introduction of the trigraphs, governing authorities responsible for the code, etc.)

Many thanks for information to: Thomas Chan, Lee Collins, Dan Jacobson, John Jenkins, John Savard, Mok-Kong Shen, Tony Smith, and an anonymous informant.

Jim Reeds
If you spot any errors or have any questions or suggestions, please send me an email.

Last modified 21 Dec 2004.