Constructing Chinese websites


While constructing a Chinese website, a webmaster will inevitably encounter different problems, which are both of cultural as well as technical nature. For example, it can by no means be said that the Chinese taste concerning design or the overall form is the same as the Europeans or Americans taste. By far the most substantial difference concerning technical matters lies without any doubt with the script, for that it requires a complete different encoding than for sites that use Latin letters.  

Encoding – Theory

Texts that were written using a computer are based on two basic units, bits and bytes. Bits can have two states, namely 0 and 1. Eight bits make one byte, that is to say that one byte is in fact a number with eight digits, which consist only of a number of 0’s and 1’s (e.g. 01110101). 0 and 1 can be put together resulting in up to 256 different 8-digit numbers (28). Thus, as result there are 256 different Bytes. One byte corresponds to a character, be it a letter or a special sign like ä, á or å.

The international organization for standardization summarized these 256 characters by putting them into a chart, the so-called ISO 8859-1 table, which is widely acknowledged in Europe and which includes all common signs used in Europe: the alphabet, German umlaut, French accent characters, Spanish characters with tilde as well as signs that are used in business and scientific language. For reason of simplicity, in this table all of the exact information concerning a specific Byte are not included, instead only a reduced version is included. For example, the byte containing the row 01100001 has been attached with corresponding number 97, which in turn was determined as being the Code for the character “a” (01100001 = 97 = a).

However, the ISO 8859-1 table is not the only table. Further examples are both the MS-DOS-Code table and the Windows-Code table. Although the various tables can include the same characters, to which however different codifications might possibly have been attached, they can also include different signs, for that MS-DOS and Windows operate with different characters. Thus, there are completely different tables; each used depending of which characters are to be used and for which special purpose. 

For constructing a Chinese Website, the Webmaster does in fact nothing different to that. The first step is to decide in the source code which table he is going to use, namely the table which contains coding for Chinese characters. Thus, the Browser knows which coding is to be used in order to depict the correct meaning on the display.

Coding – conversion

Nowadays, in respect to Chinese language, one has to differentiate between classical and modern characters. Classical characters are still widely used in Hong Kong and Taiwan, while in contrary the Chinese people on the main land are using simplified characters. For this reason, it is clear that they need two different versions of code tables. Big5 is the most used coding language of traditional characters, while GuoBiao2312 (GB) has become the most popular Code for the simplified characters.

Another, a third modern coding table, Unicode, is based on another principle than that explained above, and it can summarize most languages of the world in just one coding table, including codification for traditional as well as modern Chinese. Unicode is currently becoming more and more popular. However, for the very fact, that at present not all of the Browsers are able to support Unicode, we have to carry on dealing with Big5 and GB.

The User can decide himself which Code the Browser is to use by way of fixing it in the Browser settings. However, most of the popular Browser can identify the code automatically by themselves, at least as long the webmaster, while preparing the website, has provided information in the source text as which coding is to be used. This would have to happen by inserting a meta-tag between <head> and </head>. The Code to set is as follows:

For charset_name you have to insert the name of the coding table, i.e. gb2312 or big5.

For constructing European sites, it is common to adapt font style with help of a font-Tag. This is not recommended however for the case when stylesheets both for a site’s European and Chinese version shall be used, because it can well happen that the chosen font will not support Chinese characters at all. Internet Explorer can recognize this and will change to another font, while other browsers cannot read it and will possibly only produce a row of squares on the display. 

In former times, and still today albeit only in Taiwan, Chinese has always been written vertically, that means it has not been read from left to right, but from top to bottom. Since the existence of Internet Explorer 5.5, it became possible to reproduce Chinese in vertical form even in the internet. For further information, visit:


msdn.microsoft.com/workshop/author/dhtml/reference/properties/writingmode.asp


In case the webmaster is opting to use a help program for constructing a site, e.g. Dreamweaver, then he would be able to adjust the coding that shall be used for writing and displaying the site’s content directly in the configurations settings. If using Dreamweaver, it should be possible to select the appropriate coding by looking under Modify -> site’s properties -> Title/coding.
Finally, here are some examples of popular Chinese sites:  

 

- www.sina.com.cn

- www.sohu.com

- www.tom.com

- www.163.com

- www.etang.com

Recommend this article

moreEconomy