Tamil Script Code for Information Interchange

Tamil Script Code for Information Interchange (TSCII) is a coding scheme for representing the Tamil script. The lower 128 codepoints are plain ASCII, the upper 128 codepoints are TSCII-specific. After long years of being used on the Internet by private agreement only, it was successfully registered with the IANA in 2007.[1]

TSCII encodes the characters in visual (written) order, paralleling the use of the Tamil Typewriter.

Unicode has used the logical order encoding strategy for Tamil, following ISCII, in contrast to the case of Thai, where the visual order encoding grandfathered by TIS-620 was adopted.

The government of Tamil Nadu endorses its own TAB/TAM standards for 8-bit encoding and other, older encoding schemes can still be found on the WWW.

The free etext collection at Project Madurai uses the TSCII encoding, but has already started to provide Unicode versions.

History

The need for a common encoding for Tamil was felt by members of various mailing list based forums in mid-1990s, as there were multiple custom coded fonts were prevalent in those forums. While some of the commercial encodings were popular than the others, they were not accepted by wider community due to conflicting commercial interests. While Unicode was accepted by most as the future standard, most of the desktop systems at that time were still not capable of handling Unicode for Tamil language, and an interim 8-bit encoding was required.

A separate mailing list for discussion of such encodings (webmasters@tamil.net) was created in 1997 to initiate this discussion, starting with an email written by Dr.K.Kalyanasundaram to the popular Tamil author Sujatha who headed the committee for standardization of Tamil keyboard.[2] This forum quickly attracted enthusiastic participants from across the globe, including several prominent Tamil scholars. Archives of these discussion are maintained by INFITT.[3]

Subsequent to publishing TSCII, most of the members of webmasters@tamil.net mailing list became part of INFITT, which is a wider initiative to bring in standardization and continued development in various areas of Tamil computing.

Codepage layout

TSCII
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
 
8_
 

0BE6
128

0BE7
129
ஸ்ரீ
0BB8 0BCD 0BB0 0BC0
130

0B9C
131

0BB7
132

0BB8
133

0BB9
134
க்ஷ
0B95 0BCD 0BB7
135
ஜ்
0B9C 0BCD
136
ஷ்
0BB7 0BCD
137
ஸ்
0BB8 0BCD
138
ஹ்
0BB9 0BCD
139
க்ஷ்
0B95 0BCD 0BB7 0BCD
140

0BE8
141

0BE9
142

0BEA
143
 
9_
 

0BEB
144

2018
145

2019
146

201C
147

201D
148

0BEC
149

0BED
150

0BEE
151

0BEF
152
ஙு
0B99 0BC1
153
ஞு
0B9E 0BC1
154
ஙூ
0B99 0BC2
155
ஞூ
0B9E 0BC2
156

0BF0
157

0BF1
158

0BF2
159
 
A_
 
NBSP
00A0
160

0BBE
161
ி
0BBF
162

0BC0
163

0BC1
164

0BC2
165

0BC6
166

0BC7
167

0BC8
168
©
00A9
169

0BD7
170

0B85
171

0B86
172

0B88
174

0B89
175
 
B_
 

0B8A
176

0B8E
177

0B8F
178

0B90
179

0B92
180

0B93
181

0B94
182

0B83
183

0B95
184

0B99
185

0B9A
186

0B9E
187

0B9F
188

0BA3
189

0BA4
190

0BA8
191
 
C_
 

0BAA
192

0BAE
193

0BAF
194

0BB0
195

0BB2
196

0BB5
197

0BB4
198

0BB3
199

0BB1
200

0BA9
201
டி
0B9F 0BBF
202
டீ
0B9F 0BC0
203
கு
0B95 0BC1
204
சு
0B9A 0BC1
205
டு
0B9F 0BC1
206
ணு
0BA3 0BC1
207
 
D_
 
து
0BA4 0BC1
208
நு
0BA8 0BC1
209
பு
0BAA 0BC1
210
மு
0BAE 0BC1
211
யு
0BAF 0BC1
212
ரு
0BB0 0BC1
213
லு
0BB2 0BC1
214
வு
0BB5 0BC1
215
ழு
0BB4 0BC1
216
ளு
0BB3 0BC1
217
று
0BB1 0BC1
218
னு
0BA9 0BC1
219
கூ
0B95 0BC2
220
சூ
0B9A 0BC2
221
டூ
0B9F 0BC2
222
ணூ
0BA3 0BC2
223
 
E_
 
தூ
0BA4 0BC2
224
நூ
0BA8 0BC2
225
பூ
0BAA 0BC2
226
மூ
0BAE 0BC2
227
யூ
0BAF 0BC2
228
ரூ
0BB0 0BC2
229
லூ
0BB2 0BC2
230
வூ
0BB5 0BC2
231
ழூ
0BB4 0BC2
232
ளூ
0BB3 0BC2
233
றூ
0BB1 0BC2
234
னூ
0BA9 0BC2
235
க்
0B95 0BCD
236
ங்
0B99 0BCD
237
ச்
0B9A 0BCD
238
ஞ்
0B9E 0BCD
239
 
F_
 
ட்
0B9F 0BCD
240
ண்
0BA3 0BCD
241
த்
0BA4 0BCD
242
ந்
0BA8 0BCD
243
ப்
0BAA 0BCD
244
ம்
0BAE 0BCD
245
ய்
0BAF 0BCD
246
ர்
0BB0 0BCD
247
ல்
0BB2 0BCD
248
வ்
0BB5 0BCD
249
ழ்
0BB4 0BCD
250
ள்
0BB3 0BCD
251
ற்
0BB1 0BCD
252
ன்
0BA9 0BCD
253

0B87
254
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F

In the table above 80 is U+0BE6 TAMIL DIGIT ZERO, which has been accepted in Unicode version 4.1. A0 is the NO-BREAK SPACE. The codes AD and FF are unassigned.

Conversion Tools

You can convert UTF-8 encoded documents to TSCII using the GNU iconv tools as follows,

$ iconv -f utf-8 -t tscii hello.utf8 > hello.tscii

Whereas conversion from TSCII to UTF-8 is done by interchanging -f and -t flags.

References

  1. http://www.iana.org/assignments/charset-reg/TSCII
  2. http://www.infitt.org/tscii/archives/msg00001.html
  3. http://www.infitt.org/tscii/archives/maillist.html
This article is issued from Wikipedia - version of the 5/28/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.