Previous: introduction Up: ../karrtn.html Next: support-criteria
BACKGROUND ========== Before describing the proposed primitives, some background information is useful. FORTRAN has never offered satisfactory support of character data. Indeed, some compilers extant until the mid-1960's did not even have Hollerith data items or A FORMAT descriptors, or LOGICAL variables, for that matter. When limited character support became widely available in FORTRAN, it was restricted to Hollerith string constants of the form 8HCHEMISTRY, together with the A FORMAT item. Hollerith constants were permitted by the 1966 ANSI FORTRAN Standard to occur only in DATA and FORMAT statements, and as subroutine arguments in CALL statements (but not in FUNCTION references, although no compiler that I am aware of enforces this restriction). No CHARACTER data type was introduced, and characters were forced to masquerade in the guise of other data types. Coding Hollerith strings is somewhat tedious and error-prone, because of the necessity of counting characters. Consequently, many manufacturers permitted character constants to be surrounded by delimiter characters, for example, "CHEMISTRY", but again, no general agreement was reached about what the delimiter characters ought to be. Single and double quotes are most common, but asterisks and not-equal signs have also been used. When string delimiters are used, the question arises as to how the delimiter character itself is to be represented in a string constant. Usually, the doubled-delimiter approach, "O""MALLEY" for the string O"MALLEY, has been adhered to, although CDC's use of the asterisk as a string delimiter simply prohibited its appearance as a string character. As a result of these variations, only the Hollerith string can be relied upon for portability, and automated means of converting between the different string conventions in FORTRAN source programs are available at some installations. The 1966 implementation of support for character data is just about the worst possible. The Hollerith form is certainly undesirable. Even worse is the convention for internal storage of character strings. These must always be stored left-justified in a computer word, and right-padded with blanks if the number of characters specified does not fill an integral number of machine words. The number of characters which fit in a word ranges from 1 to 10 on existing computers [BEEB79], and the left-justification means that even if one arranges to store only one character per word for word-length independence, the character will be occupying the most-significant bit positions and probably the sign bit as well. This means that even comparison of characters for equality can result in an arithmetic overflow condition on those machines where comparisons are implemented by subtraction. It also means that accessing the numerical value of a character cannot be done portably, for division by a power of two to effect a right shift of the bit pattern will fail if the sign position is occupied by a 1-bit. Another problem is that depending upon the FORTRAN type of the variable in which characters are stored, different results may be obtained on different machines. For example, character storage in LOGICAL variables is impossible on those machines which implement LOGICAL scalars and arrays as bit strings, and on most others, the 1966 Standard's prohibition of the use of the relational operators .EQ., .NE., .LT., etc. between LOGICAL variables would prevent character comparisons. Floating-point types are also unsuitable, because mantissa normalization which may occur in assignments or in expression evaluation usually will scramble the bits, destroying the characters stored in the word. This leaves INTEGER variables and arrays as the only possible repository of character data, and even this may fail. On the IBM 7030 Stretch computer, for example, integers are represented internally as floating-point numbers, and unless assembly-language coding is resorted to, it is very inconvenient just to get character data correctly in and out of variables on that machine. The 1977 FORTRAN Standard has made an attempt to remedy these difficulties by the introduction of a CHARACTER data type, but is still not going to offer a complete solution. First of all, the Hollerith data type is dropped from the 1977 Standard. This means that a very large body of existing FORTRAN software which uses character data, even in an at-present widely portable fashion, may require extensive changes to run with a FORTRAN 77 compiler, unless manufacturers can be pressed to continue support of character data stored in Hollerith constants and variables. The 1977 standard prohibits all storage equivalencing, either via COMMON and EQUIVALENCE statements, or by FUNCTION or SUBROUTINE argument associations, between CHARACTER data and all other FORTRAN data types. This was necessary to enable FORTRAN 77 to support variable-length character strings, so that declarations of the form SUBROUTINE A (B,C) CHARACTER B*(*),C(*)*(*) could be permitted, allowing CHARACTER variables to inherit both a size and an array length from a calling program. This forces a compiler to generate code to pass to a called routine the address of a string descriptor containing size and dimension information, as well the actual address of the character data. Second, standardized library support of character data in the form of useful utility routines is non-existent in the 1977 Standard, apart from the ICHAR and CHAR functions for converting between INTEGER and CHARACTER form. Third, null character strings, that is, strings of zero length, are not permitted. Null strings are in fact quite useful, and indeed, even necessary in some applications. In particular, a null string cannot be simulated by any string of non-zero length. Fourth, the 1977 Standard does not specify the character set to be used. The fact that many manufacturers employ their private versions of character sets, each with its own special character repertoire and collating sequence, only continues to perpetrate additional machine dependence upon FORTRAN users.