Original version: 
            
                Fri Jul 18 16:32:04 1997
            
            Last updates: 
            
                Sat Jun 13 14:03:05 1998   
                Thu Oct 02 00:22:12 2003   
                Fri Nov 12 15:55:21 2004
            
        
Writing HTML files without a validating parser is like trying to write computer programs without a compiler: don't do it! Fortunately, help is readily available on the Internet.
James Clark <jjc@jclark.com> is developing a new implementation of a suite of SGML parser tools, called SP. These include:
nsgmls
            sgmls-compatible validating SGML
                parser.
            spam
            sgmlnorm
            spent
            
            Besides being a complete redesign of the earlier successful
            smgls implementation, the new programs are
            designed for the future:  they support extended character
            sets, such as Unicode, and various multi-byte encodings used
            in oriental languages.
        
The new code is written almost entirely in C++ (just over 50K lines at version 1.0.1, or 2.5 times the size of Don Knuth's TeX or Metafont), and requires template support, a relatively new feature of C++ which is not yet widely available.
WARNING: To build these programs, you will need about 50MB of disk space, unless you remove the default -g compiler option. Doing so reduces the executable sizes from almost 10MB each to about 1.5MB (on a Sun SPARC Solaris 2.3 system). Alternatively, you can build them, then run the UNIX strip command on the executables to remove debug symbols.
            The SP programs can be compiled and built using
            recent releases of
            
                GNU g++ and libg++
            
            (2.7.1 or later: patches to gcc 2.7.0 are included in the
            SP distribution). g++ itself is built as part
            of the GNU gcc compiler installation;
            although that installation takes a few hours, and requires
            about 120MB of disk space to be able to run the validation
            tests before installation, it is straightforward, and
            should be problem free on most current UNIX systems.  The
            GNU compiler suite has also been built on IBM PC MS DOS
            and DEC OpenVMS systems, although those versions usually
            lag behind.
        
            WARNING: With at least libg++ 2.7.1, there is an
            installation problem that has been reported to the
            developers: make install does not
            install libio.a, libiostream.a,
            and librx.a.  libiostream.a is
            required for building SP, and most other C++
            programs.  To remedy this, I did the following steps
            manually in the libg++ directory:
        
(cd librx; make install) cp libio/libio*.a /usr/local/lib
            Unfortunately, I only discovered this problem after having
            built libg++ on 8 systems, and then having
            deleted the build trees after the make
            install, so I had to do it all over again, sigh...
        
            The SP distribution site has binaries for IBM
            PC DOS, Intel 386 Linux, Intel 386 Windows NT 3.5, Sun SunOS
            4.1.3 and Sun Solaris 2.3, so if you have such a system, you
            may not need to build any of the SP code from
            scratch, or to install g++.  Binaries are also
            available for the previous version (0.4) for DEC Alpha OSF/1
            3.x and IBM PC OS/2 systems.
        
            Just as with
            
                sgmls,
            
            lengthy command lines are needed to run these programs
            successfully.  To facilitate their use, I've prepared simple
            UNIX shell scripts
            
                html-ncheck
            
            and
            
                html-spam
            
            to hide the complexity, so that only the HTML files need to
            be provided on the script command lines.
        
            If you have installed the html-check
            distribution, and you want to use html-spam,
            you need to add to end of the HTML catalog file,
            
                /usr/local/lib/html-check/lib/catalog.
            
            these lines:
        
        -- Added at the suggestion of James Clark <jjc@jclark.com> --
        -- so that spam -p doesn't output the contents of html.decl --
SGMLDECL html.decl
        
            Without this change, the contents of html.decl
            are copied to the output if the -p is included
            in the spam invocation in html-spam
            ; omitting -p and including 
            html.decl doesn't help, because the <!DOCTYPE ...
            > line is then lost.
        
            I have successfully built sp-1.0.1 with
            g++  (gcc 2.7.1 [13-Nov-1995] and
            libg++ [15-Nov-1995]) on these systems:
        
using the command
make && make check && make install
On a few of these, minor problems cropped up and were solved; they are discussed further below.
            I also made unsuccessful attempts to build SP
            with native C++ compilers on Hewlett-Packard HP-UX 10.0.1
            and Silicon Graphics IRIX 5.3, with a command line like
        
make CXX=CC CXXFLAGS=-O DEFINES='-DANSI_CLASS_INST $(XDEFINES)'
Numerous compiler errors quickly led to my abandoning the effort.
Compilation with native Sun Solaris 2.3 CC looked initially promising, but linking failed with errors about differing sizes of particular symbols, and with many missing functions arising from template instantiation. This linking problem is just what I found with SP 0.4 on the IBM RS/6000 AIX 3.2.5 systems too.
make step completed successfully, but the
            make check failed with a shell script error
./dotest sh: bad substitutionI simply switched shells from
sh to GNU
            bash, instead of fiddling with the
            dotest script:
bash < dotestThe test completed successfully, and
make
            install worked as expected.
        
        Mail from Michael Riedmann <Michael_Riedmann@hp.com> at Hewlett-Packard GmbH in Böblingen, Germany on 12 May 1998 reported a successful build of SP version 1.3 on HP-UX 10.20 with g++ version 2.7.2.3, after installing HP patch PHKL_8693 to fix a problem with a non-ANSI extern struct declaration in /usr/include/sys/time.h.
            Once the missing libiostream.a problem (see
            above) was solved, I was able to complete the first
            successful installation of SP on the IBM RS/6000. I was
            previously completely unable to get version 0.4 to build
            successfully with either g++ or native
            xlC.
        
I also tried a build with the native C++ compiler, using
make CXX=xlC WARN= DEFINES='-DANSI_CLASS_INST $(XDEFINES)' -i
This may be close to working: here are the compilation errors produced:
sp-1.0.1/entmg:
xlC -ansi -I. -I./../lib -I./../entmgr -DANSI_CLASS_INST -c \
            ExtendEntityManager.C
"ExtendEntityManager.C", line 34.1: 1540-251: (S) The previous
            declaration of "memmove" did not have a linkage
            specification.
sp-1.0.1/app:
xlC -ansi -I. -I./../lib -I./../entmgr -I./../parser -I./../xentmgr \
            -DANSI_CLASS_INST -c LineOutputCodingSystem.C
"LineOutputCodingSystem.C", line 17.1: 1540-293: (W)
            "LineEncoder::output(const Char*,size_t,streambuf*)" hides
            the virtual function
            "Encoder::output(Char*,size_t,streambuf*)".
sp-1.0.1/nsgmls:
xlC -ansi -I. -I./../lib -I./../entmgr -I./../parser -I./../xentmgr \
            -I./../app -DANSI_CLASS_INST -c nsgmls.C
"nsgmls.C", line 77.1: 1540-055: (S) "char**" cannot be converted to
            "const char**".
"nsgmls.C", line 77.1: 1540-306: (I) The previous message applies to
            argument 2 of function "getopt(int,const char**,const
            char*)".
sp-1.0.1/spam:
xlC -ansi -I. -I./../lib -I./../entmgr -I./../parser -I./../xentmgr
            -I./../app -DANSI_CLASS_INST -c spam.C
"spam.C", line 101.1: 1540-055: (S) "char**" cannot be converted to
            "const char**".
"spam.C", line 101.1: 1540-306: (I) The previous message applies to
            argument 2 of function "getopt(int,const char**,const
            char*)".
sp-1.0.1/sgmlnorm:
xlC -ansi -I. -I./../lib -I./../entmgr -I./../xentmgr -I./../app \
            -I./../api -DANSI_CLASS_INST -c sgmlnorm.C
"sgmlnorm.C", line 43.1: 1540-055: (S) "char**" cannot be converted to
            "const char**".
"sgmlnorm.C", line 43.1: 1540-306: (I) The previous message applies to
            argument 2 of function "getopt(int,const char**,const
            char*)".
sp-1.0.1/spam:
xlC -ansi -I. -I./../lib -I./../entmgr -I./../parser -I./../xentmgr \
            -I./../app -DANSI_CLASS_INST -c spam.C
"spam.C", line 101.1: 1540-055: (S) "char**" cannot be converted to
            "const char**".
"spam.C", line 101.1: 1540-306: (I) The previous message applies to
            argument 2 of function "getopt(int,const char**,const
            char*)".
sp-1.0.1/sgmlnorm:
xlC -ansi -I. -I./../lib -I./../entmgr -I./../xentmgr -I./../app \
            -I./../api -DANSI_CLASS_INST -c sgmlnorm.C
"sgmlnorm.C", line 43.1: 1540-055: (S) "char**" cannot be converted to
            "const char**".
"sgmlnorm.C", line 43.1: 1540-306: (I) The previous message applies to
            argument 2 of function "getopt(int,const char**,const
            char*)".
sp-1.0.1/spent:
xlC -ansi -I. -I./../lib -I./../entmgr -I./../xentmgr -I./../app
            -DANSI_CLASS_INST -c spent.C
"spent.C", line 54.1: 1540-055: (S) "char* const*" cannot be converted
            to "const char**".
"spent.C", line 54.1: 1540-306: (I) The previous message applies to
            argument 2 of function "getopt(int,const char**,const
            char*)".
        
            All of the errors about getopt() arise from
            confusion between const char** and
            char* const*.  The DEC Alpha OSF/1 3.x,
            Hewlett-Packard HP-UX 10.x, Silicon Graphics IRIX 5.x, and
            Sun Solaris 2.x header files stdlib.h have
            the latter, while the IBM RS/6000 stdlib.h
            file has the former.
        
            As an experiment, I therefore temporarily modified the
            file spent/spent.C to add a type cast
            (const char**) to the second argument of
            getopt(): compilation was then successful,
            but after adding a needed -L/usr/local/lib
            search path to the LIBS variable in the
            Makefile, linking failed with massive numbers
            of unresolved external names generated from templates.
            This is the same problem that existed with both g++
            2.6.3 and xlC with sp
            0.4, and I therefore abandoned further attempts
            with the xlC compiler.
        
            I modified the top-level SP Makefile
             to set RANLIB=ranlib. The build of
            SP then completed successfully, and make
            check passed all of the validation tests.
        
On Sun SunOS 4.1.3, the Makefile needs to have comment markers removed to generate the lines
LIBOBJS = strerror.o memmove.o LIBS = -liostream -lg++ -L/usr/lang/SC1.0/ansi_lib -lansi
            Without the -lansi, function
            strtoul was not resolved from the C or C++
            libraries.  The Makefile comments
        
# On SunOS 4, using libg++ 2.6, uncomment this. # libg++ is needed for strtoul which is used by libiostream. # LIBS=-liostream -lg++
            incorrectly imply that strtoul can be found
            in libg++.a, but that is not the case.
            However, the function can be found in the library
            for the SunOS 4.x half-ANSI acc compiler.