Previous: background Up: ../karrtn.html Next: character-primitives
CRITERIA FOR SATISFACTORY SUPPORT OF CHARACTER DATA =================================================== The reaction of some people on reading the above criticisms, or having experienced them personally, will no doubt be to reject FORTRAN completely as a language in which any kind of character manipulations are to be done. There is certainly some validity to this view. However, as one Conference participant remarked, there is really no choice in the matter, for FORTRAN 66 is the only "(almost) machine-independent high-level 'assembly' language" that we have for scientific computation. FORTRAN is available on essentially all medium- and large-scale computers in the world today, and also on many microcomputers as well. It has been in existence for nearly twenty-five years, and is one of the two or three still-existing original high-level programming languages. It is widely understood by scientists and engineers the world over. A widely-implemented ANSI and ISO Standard has been in existence for fourteen years, and in fact, FORTRAN was probably the first language to be so standardized. An enormous amount of FORTRAN software, representing a huge investment of money and programmer years, already exists, and sophisticated and extensive scientific subroutine libraries such as IMSL, Harwell, Boeing, NAG, EISPACK, FUNPACK, and LINPACK are widely available. FORTRAN's lack of structured control statements, but unfortunately not its limited variety of data types, can be largely avoided by programming in a preprocessor language, such as RATFOR or SFTRAN3, which can then be translated into Portable FORTRAN. Finally, and importantly, there exist automated tools such as the PFORT Verifier, which can be used to test FORTRAN software for adherence to Portable FORTRAN syntax, grammar, and usage. In constructing a set of character primitives for widespread implementation on a variety of host machines, two goals must be kept in mind. First of all, the primitives should provide frequently-needed functions. Examples of these include packing and unpacking of characters, obtaining integer equivalents, comparing and moving strings, and letter case and character set conversions. Second, they should permit machine-independent implementation of programs which manipulate character data. The second goal carries with it an important decision. This is that a standard character set must be adopted, or at least be available via function calls, in order that such operations as sorting by collating sequence, or the use of integer equivalents of characters for governing the flow of control in programs such as parsers and lexical analyzers, can be implemented in a fashion which will guarantee that the same results will be obtained, independent of the host computer. There is fortunately at present an internationally-agreed-upon character set, known as ASCII (American National Code for Information Interchange), defined in ANSI Standard X3.4-1968 and revised in X3.4-1977. It has been adopted in Japan as the Japanese Industrial Standard Code for Information Interchange (JISCII) (1969), and by the International Standards Organization as ISO DR 1052 (1967). Unfortunately, at present the American "Big 3" computer manufacturers IBM, CDC, and UNIVAC do not provide wide support for ASCII, although both UNIVAC and CDC are evidently moving in that direction. ASCII is a 7-bit code offering 2**7 or 128 different characters, made up of 32 standard control characters, followed by a space, then the special characters !"#$%&'()*+,-./, the digits 0-9, the special characters :;<=>?@, upper-case letters A-Z, special characters [\]^_`, lower-case letters a-z, special characters {|}~, and finally, a DELete control character. With the exception of the DELete control character, the special characters following the letters may be replaced with national characters for those alphabets having more than 26 letters. Standardization work is going on at present to expand the code to 8 bits, and Cyrillic and Japanese Katakana characters have already been assigned to characters in the range 128-255 for use in the Soviet Union and Japan. This proposal recommends the adoption of the ASCII character set as a standard one, and functions are defined allowing access to it even on those computers which do not yet use it. It is worth noting in passing that the new U.S. Department of Defense programming language, ADA [SIGP79], has specified that all character data shall be in the ASCII character set, independent of the host computer.