15. Compression

Module Gz

Description

The Gz module contains functions to compress and uncompress strings using the same algorithm as the program gzip. Compressing can be done in streaming mode or all at once.

The Gz module consists of two classes; Gz.deflate and Gz.inflate. Gz.deflate is used to pack data and Gz.inflate is used to unpack data. (Think "inflatable boat")

Note

Note that this module is only available if the gzip library was available when Pike was compiled.

Note that although these functions use the same algorithm as gzip, they do not use the exact same format, so you cannot directly unzip gzipped files with these routines. Support for this will be added in the future.


Methodadler32

intadler32(string(8bit)data, void|int(0..)start_value)

Description

This function calculates the Adler-32 Cyclic Redundancy Check.


Methodcompress

string(8bit)compress(string(8bit)|String.Buffer|System.Memory|Stdio.Bufferdata, void|boolraw, void|int(0..9)level, void|intstrategy, void|int(8..15)window_size)

Description

Encodes and returns the input data according to the deflate format defined in RFC 1951.

Parameter data

The data to be encoded.

Parameter raw

If set, the data is encoded without the header and footer defined in RFC 1950. Example of uses is the ZIP container format.

Parameter level

Indicates the level of effort spent to make the data compress well. Zero means no packing, 2-3 is considered 'fast', 8 is default and higher is considered 'slow' but gives better packing.

Parameter strategy

The strategy to be used when compressing the data. One of the following.

DEFAULT_STRATEGY

The default strategy as selected in the zlib library.

FILTERED

This strategy is intented for data created by a filter or predictor and will put more emphasis on huffman encoding and less on LZ string matching. This is between DEFAULT_STRATEGY and HUFFMAN_ONLY.

RLE

This strategy is even closer to the HUFFMAN_ONLY in that it only looks at the latest byte in the window, i.e. a window size of 1 byte is sufficient for decompression. This mode is not available in all zlib versions.

HUFFMAN_ONLY

This strategy will turn of string matching completely, only doing huffman encoding. Window size doesn't matter in this mode and the data can be decompressed with a zero size window.

FIXED

In this mode dynamic huffman codes are disabled, allowing for a simpler decoder for special applications. This mode is not available in all zlib versions.

Parameter window_size

Defines the size of the LZ77 window from 256 bytes to 32768 bytes, expressed as 2^x.

See also

deflate, inflate, uncompress


Methodcrc32

intcrc32(string(8bit)data, void|int(0..)start_value)

Description

This function calculates the standard ISO3309 Cyclic Redundancy Check.


Methoduncompress

string(8bit)uncompress(string(8bit)|String.Buffer|System.Memory|Stdio.Bufferdata, void|boolraw)

Description

Uncompresses the data and returns it. The raw parameter tells the decoder that the indata lacks the data header and footer defined in RFC 1950.

Class Gz.File


Methodcreate

Gz.FileGz.File(void|string|int|Stdio.Streamfile, void|stringmode)

Parameter file

Filename or filedescriptor of the gzip file to open, or an already open Stream.

Parameter mode

mode for the file. Defaults to "rb".

See also

openStdio.File


Inherit_file

inherit ._file : _file

Description

Allows the user to open a Gzip archive and read and write it's contents in an uncompressed form, emulating the Stdio.File interface.

Note

An important limitation on this class is that it may only be used for reading or writing, not both at the same time. Please also note that if you want to reopen a file for reading after a write, you must close the file before calling open or strange effects might be the result.


Methodline_iterator

String.SplitIterator|Stdio.LineIteratorline_iterator(int|voidtrim)

Description

Returns an iterator that will loop over the lines in this file. If trim is true, all '\r' characters will be removed from the input.


Methodopen

intopen(string|int|Stdio.Streamfile, void|stringmode)

Parameter file

Filename or filedescriptor of the gzip file to open, or an already open Stream.

Parameter mode

mode for the file. Defaults to "rb". May be one of the following:

rb

read mode

wb

write mode

ab

append mode

For the wb and ab mode, additional parameters may be specified. Please se zlib manual for more info.

Returns

non-zero if successful.


Methodread

int|stringread(void|intlength)

Description

Reads data from the file. If no argument is given, the whole file is read.


Methodread_function

function(:string) read_function(intnbytes)

Description

Returns a function that when called will call read with nbytes as argument. Can be used to get various callback functions, eg for the fourth argument to String.SplitIterator.

Class Gz._file

Description

Low-level implementation of read/write support for GZip files


Methodclose

intclose()

Description

closes the file

Returns

1 if successful


Methodcreate

Gz._fileGz._file(void|string|Stdio.StreamgzFile, void|stringmode)

Description

Opens a gzip file for reading.


Methodeof

booleof()

Returns

1 if EOF has been reached.


Methodopen

intopen(string|int|Stdio.Streamfile, void|stringmode)

Description

Opens a file for I/O.

Parameter file

The filename or an open filedescriptor or Stream for the GZip file to use.

Parameter mode

Mode for the file operations. Defaults to read only. The following mode characters are unique to Gz.File.

"0"

Values 0 to 9 set the compression level from no compression to maximum available compression. Defaults to 6.

"1"
"2"
"3"
"4"
"5"
"6"
"7"
"8"
"9"
"f"

Sets the compression strategy to FILTERED.

"h"

Sets the compression strategy to HUFFMAN_ONLY.

Note

If the object already has been opened, it will first be closed.


Methodread

int|stringread(intlen)

Description

Reads len (uncompressed) bytes from the file. If read is unsuccessful, 0 is returned.


Methodseek

intseek(intpos, void|inttype)

Description

Seeks within the file.

Parameter pos

Position relative to the searchtype.

Parameter type

SEEK_SET = set current position in file to pos SEEK_CUR = new position is current+pos SEEK_END is not supported.

Returns

New position or negative number if seek failed.


Methodsetparams

intsetparams(void|int(0..9)level, void|intstrategy, void|int(8..15)window_size)

Description

Sets the encoding level, strategy and window_size.

See also

Gz.deflate


Methodtell

inttell()

Returns

the current position within the file.


Methodwrite

intwrite(stringdata)

Description

Writes the data to the file.

Returns

the number of bytes written to the file.

Class Gz.deflate

Description

This class interfaces with the compression routines in the libz library.

Note

This class is only available if libz was available and found when Pike was compiled.

See also

Gz.inflate(), Gz.compress(), Gz.uncompress()


Methodclone

Gz.deflateclone()

Description

Clones the deflate object. Typically used to test compression of new content using the same exact state.


Methodcreate

Gz.deflateGz.deflate(int(-9..9)|voidlevel, int|voidstrategy, int(8..15)|voidwindow_size)
Gz.deflateGz.deflate(mappingoptions)

Description

This function can also be used to re-initialize a Gz.deflate object so it can be re-used.

If a mapping is passed as the only argument, it will accept the parameters described below as indices, and additionally it accepts a string as dictionary.

Parameter level

Indicates the level of effort spent to make the data compress well. Zero means no packing, 2-3 is considered 'fast', 6 is default and higher is considered 'slow' but gives better packing.

If the argument is negative, no headers will be emitted. This is needed to produce ZIP-files, as an example. The negative value is then negated, and handled as a positive value.

Parameter strategy

The strategy to be used when compressing the data. One of the following.

DEFAULT_STRATEGY

The default strategy as selected in the zlib library.

FILTERED

This strategy is intented for data created by a filter or predictor and will put more emphasis on huffman encoding and less on LZ string matching. This is between DEFAULT_STRATEGY and HUFFMAN_ONLY.

RLE

This strategy is even closer to the HUFFMAN_ONLY in that it only looks at the latest byte in the window, i.e. a window size of 1 byte is sufficient for decompression. This mode is not available in all zlib versions.

HUFFMAN_ONLY

This strategy will turn of string matching completely, only doing huffman encoding. Window size doesn't matter in this mode and the data can be decompressed with a zero size window.

FIXED

In this mode dynamic huffman codes are disabled, allowing for a simpler decoder for special applications. This mode is not available in all zlib versions.

Parameter window_size

Defines the size of the LZ77 window from 256 bytes to 32768 bytes, expressed as 2^x.


Methoddeflate

string(8bit)deflate(string(8bit)|String.Buffer|System.Memory|Stdio.Bufferdata, int|voidflush)

Description

This function performs gzip style compression on a string data and returns the packed data. Streaming can be done by calling this function several times and concatenating the returned data.

The optional argument flush should be one of the following:

Gz.NO_FLUSH

Only data that doesn't fit in the internal buffers is returned.

Gz.PARTIAL_FLUSH

All input is packed and returned.

Gz.SYNC_FLUSH

All input is packed and returned.

Gz.FINISH

All input is packed and an 'end of data' marker is appended (default).

See also

Gz.inflate->inflate()

Class Gz.inflate

Description

This class interfaces with the uncompression routines in the libz library.

Note

This program is only available if libz was available and found when Pike was compiled.

See also

deflate, compress, uncompress


Methodcreate

Gz.inflateGz.inflate(int|voidwindow_size)
Gz.inflateGz.inflate(mappingoptions)

Description

If called with a mapping as only argument, create accepts the entries window_size (described below) and dictionary, which is a string to be set as dictionary.

The window_size value is passed down to inflateInit2 in zlib.

If the argument is negative, no header checks are done, and no verification of the data will be done either. This is needed for uncompressing ZIP-files, as an example. The negative value is then negated, and handled as a positive value.

Positive arguments set the maximum dictionary size to an exponent of 2, such that 8 (the minimum) will cause the window size to be 256, and 15 (the maximum, and default value) will cause it to be 32Kb. Setting this to anything except 15 is rather pointless in Pike.

It can be used to limit the amount of memory that is used to uncompress files, but 32Kb is not all that much in the great scheme of things.

To decompress files compressed with level 9 compression, a 32Kb window size is needed. level 1 compression only requires a 256 byte window.

If the options version is used you can specify your own dictionary in addition to the window size.

dictionary : string
window_size : int

Methodend_of_stream

string(8bit)end_of_stream()

Description

This function returns 0 if the end of stream marker has not yet been encountered, or a string (possibly empty) containg any extra data received following the end of stream marker if the marker has been encountered. If the extra data is not needed, the result of this function can be treated as a logical value.


Methodinflate

string(8bit)inflate(string(8bit)|String.Buffer|System.Memory|Stdio.Bufferdata)

Description

This function performs gzip style decompression. It can inflate a whole file at once or in blocks.

Example

// whole file

write(Gz.inflate()->inflate(stdin->read(0x7fffffff));// streaming (blocks)function inflate=Gz.inflate()->inflate;while(string s=stdin->read(8192))
  write(inflate(s));
See also

Gz.deflate->deflate(), Gz.decompress

Module Bz2

Description

The Bz2 module contains functions to compress and uncompress strings using the same algorithm as the program bzip2. Compressing and decompressing can be done in streaming mode feeding the compress and decompress objects with arbitrarily large pieces of data.

The Bz2 module consists of three classes; Bz2.Deflate, Bz2.Inflate and Bz2.File. Bz2.Deflate is used to compress data and Bz2.Inflate is used to uncompress data. Bz2.File is used to handle Bzip2 files.

Note

Note that this module is only available if libbzip2 was available when Pike was compiled.

Note that although the functions in Inflate and Deflate use the same algorithm as bzip2, they do not use the exact same format, so you can not directly zip files or unzip zip-files using those functions. That is why there exists a third class for files.


InheritBz2

inherit "___Bz2" : Bz2

Class Bz2.Deflate

Description

Bz2.Deflate is a builtin program written in C. It interfaces the packing routines in the bzlib library.

Note

This program is only available if libz was available and found when Pike was compiled.

See also

Bz2.Inflate()


Methodcreate

Bz2.DeflateBz2.Deflate(int(1..9)|voidblock_size)

Description

If given, block_size should be a number from 1 to 9 indicating the block size used when doing compression. The actual block size will be a 100000 times this number. Low numbers are considered 'fast', higher numbers are considered 'slow' but give better packing. The parameter is set to 9 if it is omitted.

This function can also be used to re-initialize a Bz2.Deflate object so it can be re-used.


Methoddeflate

stringdeflate(stringdata, int(0..2)|voidflush_mode)

Description

This function performs bzip2 style compression on a string data and returns the packed data. Streaming can be done by calling this function several times and concatenating the returned data.

The optional argument flush_mode should be one of the following:

Bz2.BZ_RUN

Runs Bz2.Deflate->feed()

Bz2.BZ_FLUSH

Runs Bz2.Deflate->read()

Bz2.BZ_FINISH

Runs Bz2.Deflate->finish()

See also

Bz2.Inflate->inflate()


Methodfeed

voidfeed(stringdata)

Description

This function feeds the data to the internal buffers of the Deflate object. All data is buffered until a read or a finish is done.

See also

Bz2.Deflate->read()Bz2.Deflate->finish()


Methodfinish

stringfinish(stringdata)

Description

This method feeds the data to the internal buffers of the Deflate object. Then it compresses all buffered data adds a end of data marker ot it, returns the compressed data as a string, and reinitializes the deflate object.

See also

Bz2.Deflate->feed()Bz2.Deflate->read()


Methodread

stringread(stringdata)

Description

This function feeds the data to the internal buffers of the Deflate object. Then it compresses all buffered data and returns the compressed data as a string

See also

Bz2.Deflate->feed()Bz2.Deflate->finish()

Class Bz2.File

Description

Low-level implementation of read/write support for Bzip2 files

Note

This class is currently not available on Windows.


Methodclose

boolclose()

Description

closes the file


Methodcreate

Bz2.FileBz2.File()
Bz2.FileBz2.File(stringfilename, void|stringmode)

Description

Creates a Bz2.File object


Methodeof

booleof()

Returns

1 if EOF has been reached, 0 otherwise


InheritFile

inherit Bz2::File : File


Methodline_iterator

String.SplitIterator|Stdio.LineIteratorline_iterator(int|voidtrim)

Description

Returns an iterator that will loop over the lines in this file. If trim is true, all '\r' characters will be removed from the input.


Methodopen

boolopen(stringfile, void|stringmode)

Description

Opens a file for I/O.

Parameter file

The name of the file to be opened

Parameter mode

Mode for the file operations. Can be either "r" (read) or "w". Read is default.


Methodread

stringread(intlen)

Description

Reads len (uncompressed) bytes from the file. If len is omitted the whole file is read. If read is unsuccessful, 0 is returned.


Methodread_function

function(:string) read_function(intnbytes)

Description

Returns a function that when called will call read with nbytes as argument. Can be used to get various callback functions, eg for the fourth argument to String.SplitIterator.


Methodread_open

boolread_open(stringfile)

Description

Opens a file for reading.

Parameter file

The name of the file to be opened


Methodwrite

intwrite(stringdata)

Description

Writes the data to the file.

Returns

the number of bytes written to the file.


Methodwrite_open

boolwrite_open(stringfile)

Description

Opens a file for writing.

Parameter file

The name of the file to be opened

Class Bz2.Inflate

Description

Bz2.Inflate is a builtin program written in C. It interfaces the unpacking routines in the libz library.

Note

This program is only available if bzlib was available and found when Pike was compiled.

See also

Deflate


Methodcreate

Bz2.InflateBz2.Inflate()


Methodinflate

stringinflate(stringdata)

Description

This function performs bzip2 style decompression. It can do decompression with arbitrarily large pieces of data. When fed with data, it decompresses as much as it can and buffers the rest.

Example

while(..){ foo = compressed_data[i..i+9]; uncompressed_concatenated_data += inflate_object->inflate(foo); i = i+10; }

See also

Bz2.Deflate->deflate()

Module HPack

Description

Implementation of the HPACK (RFC 7541) header packing standard.

This is the header packing system that is used in HTTP/2 (RFC 7540).


ConstantDEFAULT_HEADER_TABLE_SIZE

constantint HPack.DEFAULT_HEADER_TABLE_SIZE

Description

This is the default static maximum size of the dynamic header table.

This constant is taken from RFC 7540 section 6.5.2.


Methodcreate_index

protectedmapping(string(8bit):int|mapping(string(8bit):int)) create_index(array(array(string(8bit))) tab)

Description

Helper function used to create the static_header_index.


Methodhuffman_decode

string(8bit)huffman_decode(string(8bit)str)

Description

Decodes the string str encoded with the static huffman code specified in RFC 7541 appendix B.

Parameter str

String to decode.

Returns

Returns the decoded string.

See also

huffman_encode().


Methodhuffman_encode

string(8bit)huffman_encode(string(8bit)str)

Description

Encodes the string str with the static huffman code specified in RFC 7541 appendix B.

Parameter str

String to encode.

Returns

Returns the encoded string.

See also

huffman_decode().


Inherit"___HPack"

inherit "___HPack" : "___HPack"


Variablestatic_header_index

protectedmapping(string(8bit):int|mapping(string(8bit):int)) HPack.static_header_index

Description

Index for static_header_tab.

Note

Note that the indices are offset by 1 (one).

Note

This variable should be regarded as a constant.

This variable is used to initialize the header index in the Context.

See also

static_header_tab, Context()->header_index


Constantstatic_header_tab

constant HPack.static_header_tab

Description

Table of static headers. RFC 7541 appendix A, Table 1.

Array
array(string(8bit)) 0..60
Array
string(8bit)0

Header name.

string(8bit)1

Default value.

Note

Note that this table is indexed starting on 0 (zero), while the corresponding table in RFC 7541 starts on 1 (one).


Methodupdate_index

protectedvoidupdate_index(mapping(string(8bit):int|mapping(string(8bit):int)) index, inti, array(string(8bit)) key)

Description

Update the specified encoder lookup index.

Parameter index

Lookup index to add an entry to.

Parameter key

Lookup key to add.

Parameter i

Value to store in the index for the key.

Enum HPack.HPackFlags

Description

Flags for Context()->encode_header() et al.


ConstantHEADER_INDEXED

constant HPack.HEADER_INDEXED

Description

Indexed header.


ConstantHEADER_INDEXED_MASK

constant HPack.HEADER_INDEXED_MASK

Description

Bitmask for indexing mode.


ConstantHEADER_NEVER_INDEXED

constant HPack.HEADER_NEVER_INDEXED

Description

Never indexed header.


ConstantHEADER_NOT_INDEXED

constant HPack.HEADER_NOT_INDEXED

Description

Unindexed header.

Class HPack.Context

Description

Context for an HPack encoder or decoder.

This class implements the majority of RFC 7541.

Functions of interest are typically encode() and decode().


Methodadd_header

int(0..0)|int(62..62)add_header(string(8bit)header, string(8bit)value)

Description

Add a header to the table of known headers and to the header index.

Parameter header

Name of header to add.

Parameter value

Value of the header.

Returns

Returns 0 (zero) if the header was too large to store. Returns the encoding key for the header on success (this is always sizeof(static_header_tab + 1 (ie 62), as new headers are prepended to the dynamic header table.

Note

Adding a header may cause old headers to be evicted from the table.

See also

get_indexed_header()


Methodcreate

HPack.ContextHPack.Context(int|voidprotocol_dynamic_max_size)

Description

Create a new HPack Context.

Parameter static_max_size

This is the static maximum size in bytes (as calculated by RFC 7541 section 4.1) of the dynamic header table. It defaults to DEFAULT_HEADER_TABLE_SIZE, and is the upper limit for set_dynamic_size().

See also

set_dynamic_size()


Methoddecode

array(array(string(8bit)|HPackFlags)) decode(Stdio.Bufferbuf)

Description

Decode a HPack header block.

Parameter buf

Input buffer.

Returns

Returns an array of headers. Cf decode_header().

See also

decode_header(), encode()


Methoddecode_header

array(string(8bit)|HPackFlags) decode_header(Stdio.Bufferbuf)

Description

Decode a single HPack header.

Parameter buf

Input buffer.

Returns

Returns UNDEFINED on empty buffer. Returns an array with a header and value otherwise:

Array
string(8bit)0

Name of the header. Under normal circumstances this is always lower-case, but no check is currently performed.

string(8bit)1

Value of the header.

HPackFlags|void2

Optional encoding flags. Only set for fields having HEADER_NEVER_INDEXED.

The elements in the array are in the same order and compatible with the arguments to encode_header().

Throws

Throws on encoding errors.

Note

The returned array MUST NOT be modified.

Note

In future implementations the result array may get extended with a flag field.

Note

The in-band signalling of encoding table sizes is handled internally.

See also

decode(), encode_header()


Variabledynamic_headers

protectedarray(array(string(8bit))) HPack.Context.dynamic_headers

Description

Table of currently available dynamically defined headers.

New entries are appended last, and the first dynamic_prefix elements are not used.

See also

header_index, add_header()


Variabledynamic_max_size

protectedint HPack.Context.dynamic_max_size

Description

Current upper size limit in bytes for dynamic_headers.

See also

set_dynamic_size()


Variabledynamic_prefix

protectedint HPack.Context.dynamic_prefix

Description

Index of first avaiable header in dynamic_headers.


Variabledynamic_size

protectedint HPack.Context.dynamic_size

Description

Current size in bytes of dynamic_headers.


Methodencode

voidencode(array(array(string(8bit)|HPackFlags)) headers, Stdio.Bufferbuf)

Description

Encode a full set of headers.

Parameter headers

An array of ({ header, value })-tuples.

Parameter buf

Output buffer.

See also

encode_header(), decode()


Methodencode

variantstring(8bit)encode(array(array(string(8bit))) headers)

Description

Convenience variant of encode().

Parameter headers

An array of ({ header, value })-tuples.

Returns

Returns the corresponding HPack encoding.


Methodencode_header

voidencode_header(Stdio.Bufferbuf, string(8bit)header, string(8bit)value, HPackFlags|voidflags)

Description

Encode a single HPack header.

Parameter buf

Output buffer.

Parameter header

Name of header. This should under normal circumstances be a lower-case string, but this is currently not checked.

Parameter value

Header value.

Parameter flags

Optional encoding flags.

See also

encode(), decode_header()


Methodevict_dynamic_headers

protectedvoidevict_dynamic_headers()

Description

Evict dynamic headers until dynamic_size goes below dynamic_max_size.


Methodget_indexed_header

array(string(8bit)) get_indexed_header(int(1..)index)

Description

Lookup a known header.

Parameter index

Encoding key for the header to retrieve.

Returns

Returns UNDEFINED on unknown header. Returns an array with a header and value otherwise:

Array
string(8bit)0

Name of the header. Under normal circumstances this is always lower-case, but no check is currently performed.

string(8bit)1

Value of the header.

See also

add_header()


Variableheader_index

protectedmapping(string(8bit):int|mapping(string(8bit):int)) HPack.Context.header_index

Description

Index into dynamic_headers and static_headers.

"header_name" : mapping(string(8bit):int)|int

Indexed on the header name in lower-case. The value is one of:

int

Index value for the header value "".

mapping(string(8bit):int)
"header_value" : mapping(string(8bit):int)

Index value for the corresponding header value.

The index values in turn are coded as follows:

(1..)

Index into static_header_tab offset by 1.

0

Not used.

(..-1)

Inverted (`~()) index into dynamic_headers.

See also

dynamic_headers, static_header_tab, add_header()


Methodput_int

protectedvoidput_int(Stdio.Bufferbuf, int(8bit)bits, int(8bit)mask, intvalue)

Description

Encode an integer with the HPack integer encoding.

Parameter buf

Output buffer.

Parameter bits

Bits that should always be set in the first byte of output.

Parameter mask

Bitmask for the value part of the first byte of output.

Parameter value

Integer value to encode.


Methodput_string

protectedvoidput_string(Stdio.Bufferbuf, string(8bit)str)

Description

Encode a string with the HPack string encoding.

Parameter buf

Output buffer.

Parameter str

String to output.

The encoder will huffman_encode() the string if that renders a shorter encoding than the verbatim string.


Methodset_dynamic_size

voidset_dynamic_size(Stdio.Bufferbuf, int(0..)new_max_size)

Description

Set the dynamic maximum size of the dynamic header lookup table.

Parameter buf

Output buffer.

Parameter new_max_size

New dynamic maximum size in bytes (as calculated by RFC 7541 section 4.1).

Note

This function can be used to clear the dynamic header table by setting the size to zero.

Note

Also note that the new_max_size has an upper bound that is limited by static_max_size.

See also

encode_header(), encode(), create().


Variablestatic_max_size

protectedint HPack.Context.static_max_size

Description

Static upper size limit in bytes for dynamic_headers.

See also

create(), set_dynamic_size()