cvsdist e44dea4
.\"-- << kanji code converter >> ----
cvsdist e44dea4
.\"
cvsdist e44dea4
.\" kcc.jman
cvsdist e44dea4
.\"                                                 Aug 24 1992
cvsdist e44dea4
.\"                                     mod:        Nov 19 1992
cvsdist e44dea4
.\"				translated:	    Oct 15 1999
cvsdist e44dea4
.\"------------------------------------------------ tonooka ------------
cvsdist e44dea4
.\" @(#)kcc.jman 2.1 (Y.Tonooka) 9/6/93
cvsdist e44dea4
.TH KCC L "November 19, 1992" "Y. Tonooka"
cvsdist e44dea4
.SH NAME
cvsdist e44dea4
kcc \- Kanji code coverter with encoding auto detection
cvsdist e44dea4
.SH SYNOPSIS
cvsdist e44dea4
.B kcc
cvsdist e44dea4
[
cvsdist e44dea4
.BI \- IO chnvxz
cvsdist e44dea4
] [
cvsdist e44dea4
.BI \-b " bufsize"
cvsdist e44dea4
]
cvsdist e44dea4
.RI [ " file " "] .\|.\|."
cvsdist e44dea4
.SH DESCRIPTION
cvsdist e44dea4
.B kcc
cvsdist e44dea4
is a filter that reads
cvsdist e44dea4
.I file
cvsdist e44dea4
sequencially, converts kanji encodings and output to stdou.
cvsdist e44dea4
If no file is specified, or specified
cvsdist e44dea4
.B \-
cvsdist e44dea4
as filename, it read from stdin.
cvsdist e44dea4
You can specify kanji encodings for input/output. However, 
cvsdist e44dea4
.B kcc
cvsdist e44dea4
detect input encodig automatically, if you don't specify input encoding.
cvsdist e44dea4
.LP
cvsdist e44dea4
Available kanji encodings are
cvsdist e44dea4
.SM JIS (7 bit and/or 8 bit), Shift
cvsdist e44dea4
.SM JIS\s0\s-1EUC\s0\s-1DEC\s0.
cvsdist e44dea4
For input encoding, you can mix when these are pair of one of \s-1EUC\s0
cvsdist e44dea4
\s-1DEC\s0 or Shift
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
and 7 bit
cvsdist e44dea4
.SM JIS.
cvsdist e44dea4
.BR \s-1SI\s0 / \s-1SO\s0 \s-1ESC\s0(I
cvsdist e44dea4
are recognized as halfwidth of \s-1JIS\s0.
cvsdist e44dea4
.SH OPTIONS
cvsdist e44dea4
.TP
cvsdist e44dea4
.BI \- O
cvsdist e44dea4
.PD 0
cvsdist e44dea4
.TP
cvsdist e44dea4
.BI \- IO
cvsdist e44dea4
.IR I " for input kanji encoding¡¤" O
cvsdist e44dea4
for output kanji encoding.  When no input encoding specified, 
cvsdist e44dea4
it will be detected automatically, and if both of input/output
cvsdist e44dea4
aren't specified, output encoding is 7 bit
cvsdist e44dea4
.SM JIS.
cvsdist e44dea4
.PD
cvsdist e44dea4
.IP
cvsdist e44dea4
You can specify one of the followings for the input encoding option, 
cvsdist e44dea4
.I I.
cvsdist e44dea4
.LP
cvsdist e44dea4
.RS 10
cvsdist e44dea4
.PD 0
cvsdist e44dea4
.TP
cvsdist e44dea4
.B e
cvsdist e44dea4
.SM EUC\s0(available with 7 bit
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
)
cvsdist e44dea4
.TP
cvsdist e44dea4
.B d
cvsdist e44dea4
.SM DEC\s0(available with 7 bit
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
)
cvsdist e44dea4
.TP
cvsdist e44dea4
.B s
cvsdist e44dea4
Shift
cvsdist e44dea4
.SM JIS\s0(available with 7 bit
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
)
cvsdist e44dea4
.TP
cvsdist e44dea4
.BR j 7 " or " k
cvsdist e44dea4
7 bit
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
.TP
cvsdist e44dea4
.B 8
cvsdist e44dea4
8 bit
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
.PD
cvsdist e44dea4
.RE
cvsdist e44dea4
.IP
cvsdist e44dea4
You can specify one of the followings for output encoding option, 
cvsdist e44dea4
.I O.
cvsdist e44dea4
.LP
cvsdist e44dea4
.RS 10
cvsdist e44dea4
.PD 0
cvsdist e44dea4
.TP
cvsdist e44dea4
.B e
cvsdist e44dea4
.SM EUC
cvsdist e44dea4
.TP
cvsdist e44dea4
.B d
cvsdist e44dea4
.SM DEC
cvsdist e44dea4
.TP
cvsdist e44dea4
.B s
cvsdist e44dea4
Shift
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
.TP
cvsdist e44dea4
.BR j\fIXY " or " 7\fIXY
cvsdist e44dea4
7 bit
cvsdist e44dea4
.RB \s-1JIS\s0(using \s-1SI\s0 / \s-1SO\s0
cvsdist e44dea4
for 
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
kana designation)
cvsdist e44dea4
.TP
cvsdist e44dea4
.BI k XY
cvsdist e44dea4
7 bit
cvsdist e44dea4
.RB \s-1JIS\s0(using \s-1ESC\s0(I
cvsdist e44dea4
for
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
kana designation)
cvsdist e44dea4
.TP
cvsdist e44dea4
.BI 8 XY
cvsdist e44dea4
8 bit
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
.PD
cvsdist e44dea4
.RE
cvsdist e44dea4
.IP
cvsdist e44dea4
By
cvsdist e44dea4
.I XY
cvsdist e44dea4
in
cvsdist e44dea4
.I O
cvsdist e44dea4
option, 
cvsdist e44dea4
You can specify which escape sequence used in \s-1JIS\s0 encoding.
cvsdist e44dea4
.B BJ
cvsdist e44dea4
is default.   Supplimental kanji designation is fixed to
cvsdist e44dea4
.B \s-1ESC\s0$(D
cvsdist e44dea4
.LP
cvsdist e44dea4
.RS 10
cvsdist e44dea4
.PD 0
cvsdist e44dea4
.TP
cvsdist e44dea4
.I X
cvsdist e44dea4
Kanji is designated by:
cvsdist e44dea4
.RS 5
cvsdist e44dea4
.TP
cvsdist e44dea4
.B B
cvsdist e44dea4
.BR \s-1ESC\s0$B "(JIS X0208-1983)
cvsdist e44dea4
.TP
cvsdist e44dea4
.B @
cvsdist e44dea4
.BR \s-1ESC\s0$@ "(JIS X0208-1978)
cvsdist e44dea4
.TP
cvsdist e44dea4
.B +
cvsdist e44dea4
.BR \s-1ESC\s0&@\s-1ESC\s0$B "(JIS X0212-1990)
cvsdist e44dea4
.RE
cvsdist e44dea4
.TP
cvsdist e44dea4
.I Y
cvsdist e44dea4
Alpha Numerical is designated by:
cvsdist e44dea4
.RS 5
cvsdist e44dea4
.TP
cvsdist e44dea4
.B B
cvsdist e44dea4
.BR \s-1ESC\s0(B "(ASCII)
cvsdist e44dea4
.TP
cvsdist e44dea4
.B J
cvsdist e44dea4
.BR \s-1ESC\s0(J "(JIS Roman; JIS X0201)
cvsdist e44dea4
.TP
cvsdist e44dea4
.B H
cvsdist e44dea4
.BR \s-1ESC\s0(H "(Swedish; strongly deprecated)
cvsdist e44dea4
.PD
cvsdist e44dea4
.RE
cvsdist e44dea4
.RE
cvsdist e44dea4
.TP
cvsdist e44dea4
.B \-v
cvsdist e44dea4
outputs result of input encoding detection to stderr.
cvsdist e44dea4
.TP
cvsdist e44dea4
.B \-x
cvsdist e44dea4
Extension mode.  By auto detection of input encodings, recognize
cvsdist e44dea4
user-defined characters and extended character region (
cvsdist e44dea4
out of range of \s-1EUC\s0, undefined halfwidth kana, control character,
cvsdist e44dea4
.SM C1
cvsdist e44dea4
area and/or extended character region Shift
cvsdist e44dea4
.SM C1
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
). Distinguish between \s-1DEC\s0 and
cvsdist e44dea4
.SM EUC
cvsdist e44dea4
is done in this mode.
cvsdist e44dea4
.TP
cvsdist e44dea4
.B \-z
cvsdist e44dea4
Shrink mode. Don't recognize halfwidth kana (except 7 bit
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
) with input encoding detection.
cvsdist e44dea4
With this option, accuracy of auto detection of input encodings becomes
cvsdist e44dea4
much better for file without halfwidth kana.
cvsdist e44dea4
.TP
cvsdist e44dea4
.B \-h
cvsdist e44dea4
Normally, When converted halfwidth kana to 
cvsdist e44dea4
.SM DEC
cvsdist e44dea4
, it becomes fullwidth Katakana.  With this option, it becomes Hiragana.
cvsdist e44dea4
.TP
cvsdist e44dea4
.B \-n
cvsdist e44dea4
user-defined characters, extended characters and supplimental kanji 
cvsdist e44dea4
characters areconverted to fullwidth white box, and undefined region of 
cvsdist e44dea4
halfwidth kana are converted to halfwidth centered dot.
cvsdist e44dea4
.TP
cvsdist e44dea4
.BI \-b " bufsize"
cvsdist e44dea4
specify buffer size.  8kbytes is default.
cvsdist e44dea4
.TP
cvsdist e44dea4
.B \-c
cvsdist e44dea4
don't convert but check input encoding and print result to stdout.
cvsdist e44dea4
Different with normal auto-detection,  whole contents of file is checked.
cvsdist e44dea4
However, when inconsistency of encodings is found, abort reading and print
cvsdist e44dea4
"data".  Options except \fB\-x\fR¡¤\fB\-z\fR are ignored.
cvsdist e44dea4
.SH EXAMPLES
cvsdist e44dea4
.IP "\fB% kcc \-e \fIfile"
cvsdist e44dea4
Input encoding are detect automatically, and output is in
cvsdist e44dea4
.SM EUC
cvsdist e44dea4
encoding.
cvsdist e44dea4
.IP "\fB% kcc \-sj \fIfile1 file2"
cvsdist e44dea4
Two files in Shift
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
concatinated with converting to 
cvsdist e44dea4
.SM JIS.
cvsdist e44dea4
.IP "\fB% \fIcommand\fB | kcc \-k+J"
cvsdist e44dea4
output of 
cvsdist e44dea4
.I command are converted to 
cvsdist e44dea4
.SM JIS\s0(\s-1JIS\s0 JIS X0208
cvsdist e44dea4
\s-1JIS\s0 JIS Roman¡¤\fB\s-1ESC\s0(I\fR Halfwidth Kana
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
)
cvsdist e44dea4
.IP "\fB% kcc \-c \fIfile"
cvsdist e44dea4
Encoding of contents of
cvsdist e44dea4
.I file
cvsdist e44dea4
is detected(no conversion)
cvsdist e44dea4
.SH BUG
cvsdist e44dea4
Auto detection of input encoding is well done for normal case, however,
cvsdist e44dea4
it has the following problems.
cvsdist e44dea4
.LP
cvsdist e44dea4
7 bit
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
is recognized by escape sequence in certain.
cvsdist e44dea4
\s-1EUC\s0 and
cvsdist e44dea4
.SM DEC
cvsdist e44dea4
are the same (refered as
cvsdist e44dea4
.SM EUC
cvsdist e44dea4
series).  Halfwidth kana of 8 bit
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
is the same as halfwidth kana of Shift
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
(refered as Shift
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
series).  However, 
cvsdist e44dea4
.SM EUC
cvsdist e44dea4
series and
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
, which are both 8 bit encoding, are sharing the same regions widely. 
cvsdist e44dea4
So, the problem in auto detection is detection of these 2 encodings.
cvsdist e44dea4
.LP
cvsdist e44dea4
Detection of 
cvsdist e44dea4
.SM EUC
cvsdist e44dea4
series/Shift
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
series is done in line by line, When it is found that it's not Shift
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
series, or it's not \s-1EUC\s0 series, encoding is determined.
cvsdist e44dea4
When inconsistensy found, it will be treated as "data" and 
cvsdist e44dea4
contents of output is not guaranteed.
cvsdist e44dea4
.LP
cvsdist e44dea4
While determined between 
cvsdist e44dea4
.SM EUC
cvsdist e44dea4
series/Shift
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
series after 8bit code found,  conversions are pending and put input
cvsdist e44dea4
data in buffer,  however, buffer is fulled, it assumes it's
cvsdist e44dea4
.SM EUC
cvsdist e44dea4
series and forces to start conversion. Rationale. Usually, we can 
cvsdist e44dea4
assume that documents with kanji include
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
non-kanji or
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
first standard, it can be detected in certain if it is Shift
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
, which does not share region with 
cvsdist e44dea4
.SM EUC.
cvsdist e44dea4
So if it can't be determined, it's very likely to be 
cvsdist e44dea4
.SM EUC.
cvsdist e44dea4
.LP
cvsdist e44dea4
8 bit
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
and it has always even number of halfwidth kana sequences, then
cvsdist e44dea4
it will be wrongly detected as \s-1EUC\s0 kanji. Be ceraful.
cvsdist e44dea4
.LP
cvsdist e44dea4
If input encoding doesn't have halfwidth kana, use 
cvsdist e44dea4
.B \-z
cvsdist e44dea4
and accuracy of detection become much better. 
cvsdist e44dea4
This is because shared region are restricted to area of 
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
second standards.
cvsdist e44dea4
.LP
cvsdist e44dea4
Extended region of
cvsdist e44dea4
Shift
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
user-defined area of \s-1EUC\s0, control characters 
cvsdist e44dea4
.SM C1
cvsdist e44dea4
of \s-1EUC\s0, undefined region of halfwidth kana of \s-1EUC\s0 
cvsdist e44dea4
are out of range of auto detection, so it will fails to detect
cvsdist e44dea4
encodings if input has these characters.  Use 
cvsdist e44dea4
.B \-x
cvsdist e44dea4
option to specify extended mode, or specify input code.
cvsdist e44dea4
.SH "SEE ALSO"
cvsdist e44dea4
.BR cat (1)
cvsdist e44dea4
.SH NOTES
cvsdist e44dea4
Usually, user-defined characters, extended characters, supplimental kanji
cvsdist e44dea4
characters are mapped respectively. However characters that is
cvsdist e44dea4
out of range of extended characters become FCFC in hexadecimal when
cvsdist e44dea4
converted to Shift
cvsdist e44dea4
.SM JIS.
cvsdist e44dea4
Although control character region
cvsdist e44dea4
.SM C1
cvsdist e44dea4
of 
cvsdist e44dea4
\s-1EUC\s0 and
cvsdist e44dea4
.SM DEC
cvsdist e44dea4
remains when converted to 
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
, these will be deleted when converted to Shift
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
Undefined area of halfwidth kana become halfwidth centered dot
cvsdist e44dea4
when convered to Shift
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
Halfwidth kana become fullwidth kana when converted to 
cvsdist e44dea4
.SM DEC.
cvsdist e44dea4
.LP
cvsdist e44dea4
When output is 
cvsdist e44dea4
.SM JIS
cvsdist e44dea4
encoding, control characters such as newline, TAB, DEL and white space
cvsdist e44dea4
(halfwidth) will be output in ASCII mode.
cvsdist e44dea4
.LP
cvsdist e44dea4
When encoding of input is detected wrongly, or input undefined
cvsdist e44dea4
character for expected character sets, output is indefined.
cvsdist e44dea4
.LP
cvsdist e44dea4
This manual are translated by Fumitoshi UKAI <ukai@debian.or.jp>
cvsdist e44dea4
for Debian system, but you can use it for any purpose.
cvsdist e44dea4