91f19e Add support for IBM Z hardware-accelerated deflate

Authored and Committed by odubaj 2 months ago
    Add support for IBM Z hardware-accelerated deflate
    
    Future versions of IBM Z mainframes will provide DFLTCC instruction,
    which implements deflate algorithm in hardware with estimated
    compression and decompression performance orders of magnitude faster
    than the current zlib and ratio comparable with that of level 1.
    
    This patch adds DFLTCC support to zlib. In order to enable it, the
    following build commands should be used:
    
        $ CFLAGS=-DDFLTCC ./configure
        $ make OBJA=dfltcc.o PIC_OBJA=dfltcc.lo
    
    When built like this, zlib would compress in hardware on level 1, and in
    software on all other levels. Decompression will always happen in
    hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to
    make it used by default) one could either add -DDFLTCC_LEVEL_MASK=0x7e
    at compile time, or set the environment variable DFLTCC_LEVEL_MASK to
    0x7e at run time.
    
    Two DFLTCC compression calls produce the same results only when they
    both are made on machines of the same generation, and when the
    respective buffers have the same offset relative to the start of the
    page. Therefore care should be taken when using hardware compression
    when reproducible results are desired. One such use case - reproducible
    software builds - is handled explicitly: when SOURCE_DATE_EPOCH
    environment variable is set, the hardware compression is disabled.
    
    DFLTCC does not support every single zlib feature, in particular:
    
        * inflate(Z_BLOCK) and inflate(Z_TREES)
        * inflateMark()
        * inflatePrime()
        * deflateParams() after the first deflate() call
    
    When used, these functions will either switch to software, or, in case
    this is not possible, gracefully fail.
    
    This patch tries to add DFLTCC support in a least intrusive way.
    All SystemZ-specific code was placed into a separate file, but
    unfortunately there is still a noticeable amount of changes in the
    main zlib code. Below is the summary of those changes.
    
    DFLTCC takes as arguments a parameter block, an input buffer, an output
    buffer and a window. Since DFLTCC requires parameter block to be
    doubleword-aligned, and it's reasonable to allocate it alongside
    deflate and inflate states, ZALLOC_STATE, ZFREE_STATE and ZCOPY_STATE
    macros were introduced in order to encapsulate the allocation details.
    The same is true for window, for which ZALLOC_WINDOW and
    TRY_FREE_WINDOW macros were introduced.
    
    While for inflate software and hardware window formats match, this is
    not the case for deflate. Therefore, deflateSetDictionary and
    deflateGetDictionary need special handling, which is triggered using the
    new DEFLATE_SET_DICTIONARY_HOOK and DEFLATE_GET_DICTIONARY_HOOK macros.
    
    deflateResetKeep() and inflateResetKeep() now update the DFLTCC
    parameter block, which is allocated alongside zlib state, using
    the new DEFLATE_RESET_KEEP_HOOK and INFLATE_RESET_KEEP_HOOK macros.
    
    In order to make unsupported deflateParams(), inflatePrime() and
    inflateMark() calls to fail gracefully, the new DEFLATE_PARAMS_HOOK,
    INFLATE_PRIME_HOOK and INFLATE_MARK_HOOK macros were introduced.
    
    The algorithm implemented in hardware has different compression ratio
    than the one implemented in software. In order for deflateBound() to
    return the correct results for the hardware implementation, the new
    DEFLATE_BOUND_ADJUST_COMPLEN and DEFLATE_NEED_CONSERVATIVE_BOUND macros
    were introduced.
    
    Actual compression and decompression are handled by the new DEFLATE_HOOK
    and INFLATE_TYPEDO_HOOK macros. Since inflation with DFLTCC manages the
    window on its own, calling updatewindow() is suppressed using the new
    INFLATE_NEED_UPDATEWINDOW() macro.
    
    In addition to compression, DFLTCC computes CRC-32 and Adler-32
    checksums, therefore, whenever it's used, software checksumming needs to
    be suppressed using the new DEFLATE_NEED_CHECKSUM and
    INFLATE_NEED_CHECKSUM macros.
    
    DFLTCC will refuse to write an End-of-block Symbol if there is no input
    data, thus in some cases it is necessary to do this manually. In order
    to achieve this, send_bits, bi_reverse, bi_windup and flush_pending
    were promoted from local to ZLIB_INTERNAL. Furthermore, since block and
    stream termination must be handled in software as well, block_state enum
    was moved to deflate.h.
    
    Since the first call to dfltcc_inflate already needs the window, and it
    might be not allocated yet, inflate_ensure_window was factored out of
    updatewindow and made ZLIB_INTERNAL.
    
        
file modified
+8 -1