15. Compression

Module Gz

Description

The Gz module contains functions to compress and uncompress strings using the same algorithm as the program gzip. Compressing can be done in streaming mode or all at once.

The Gz module consists of two classes; Gz.deflate and Gz.inflate. Gz.deflate is used to pack data and Gz.inflate is used to unpack data. (Think "inflatable boat")

Note

Note that this module is only available if the gzip library was available when Pike was compiled.

Note that although these functions use the same algorithm as gzip, they do not use the exact same format, so you cannot directly unzip gzipped files with these routines. Support for this will be added in the future.


Constant DEFAULT_STRATEGY

constant Gz.DEFAULT_STRATEGY

Description

The default strategy as selected in the zlib library.


Constant FILTERED

constant Gz.FILTERED

Description

This strategy is intented for data created by a filter or predictor and will put more emphasis on huffman encoding and less on LZ string matching. This is between DEFAULT_STRATEGY and HUFFMAN_ONLY.


Constant FIXED

constant Gz.FIXED

Description

In this mode dynamic huffman codes are disabled, allowing for a simpler decoder for special applications. This mode is not available in all zlib versions.


Constant HUFFMAN_ONLY

constant Gz.HUFFMAN_ONLY

Description

This strategy will turn of string matching completely, only doing huffman encoding. Window size doesn't matter in this mode and the data can be decompressed with a zero size window.


Constant RLE

constant Gz.RLE

Description

This strategy is even closer to the HUFFMAN_ONLY in that it only looks at the latest byte in the window, i.e. a window size of 1 byte is sufficient for decompression. This mode is not available in all zlib versions.


Method adler32

int adler32(string(8bit) data, void|int(0..) start_value)

Description

This function calculates the Adler-32 Cyclic Redundancy Check.


Method compress

string(8bit) compress(string(8bit)|String.Buffer|System.Memory|Stdio.Buffer data, void|bool raw, void|int(0..9) level, void|int strategy, void|int(8..15) window_size)

Description

Encodes and returns the input data according to the deflate format defined in RFC 1951.

Parameter data

The data to be encoded.

Parameter raw

If set, the data is encoded without the header and footer defined in RFC 1950. Example of uses is the ZIP container format.

Parameter level

Indicates the level of effort spent to make the data compress well. Zero means no packing, 2-3 is considered 'fast', 8 is default and higher is considered 'slow' but gives better packing.

Parameter strategy

The strategy to be used when compressing the data. One of the following.

DEFAULT_STRATEGY

The default strategy as selected in the zlib library.

FILTERED

This strategy is intented for data created by a filter or predictor and will put more emphasis on huffman encoding and less on LZ string matching. This is between DEFAULT_STRATEGY and HUFFMAN_ONLY.

RLE

This strategy is even closer to the HUFFMAN_ONLY in that it only looks at the latest byte in the window, i.e. a window size of 1 byte is sufficient for decompression. This mode is not available in all zlib versions.

HUFFMAN_ONLY

This strategy will turn of string matching completely, only doing huffman encoding. Window size doesn't matter in this mode and the data can be decompressed with a zero size window.

FIXED

In this mode dynamic huffman codes are disabled, allowing for a simpler decoder for special applications. This mode is not available in all zlib versions.

Parameter window_size

Defines the size of the LZ77 window from 256 bytes to 32768 bytes, expressed as 2^x.

See also

deflate, inflate, uncompress


Method crc32

int crc32(string(8bit) data, void|int(0..) start_value)

Description

This function calculates the standard ISO3309 Cyclic Redundancy Check.


Method uncompress

string(8bit) uncompress(string(8bit)|String.Buffer|System.Memory|Stdio.Buffer data, void|bool raw)

Description

Uncompresses the data and returns it. The raw parameter tells the decoder that the indata lacks the data header and footer defined in RFC 1950.

Class Gz.File


Inherit _file

inherit ._file : _file

Description

Allows the user to open a Gzip archive and read and write it's contents in an uncompressed form, emulating the Stdio.File interface.

Note

An important limitation on this class is that it may only be used for reading or writing, not both at the same time. Please also note that if you want to reopen a file for reading after a write, you must close the file before calling open or strange effects might be the result.


Method create

Gz.File Gz.File(void|string|int|Stdio.Stream file, void|string mode)

Parameter file

Filename or filedescriptor of the gzip file to open, or an already open Stream.

Parameter mode

mode for the file. Defaults to "rb".

See also

open Stdio.File


Method line_iterator

String.SplitIterator|Stdio.LineIterator line_iterator(int|void trim)

Description

Returns an iterator that will loop over the lines in this file. If trim is true, all '\r' characters will be removed from the input.


Method open

int open(string|int|Stdio.Stream file, void|string mode)

Parameter file

Filename or filedescriptor of the gzip file to open, or an already open Stream.

Parameter mode

mode for the file. Defaults to "rb". May be one of the following:

rb

read mode

wb

write mode

ab

append mode

For the wb and ab mode, additional parameters may be specified. Please se zlib manual for more info.

Returns

non-zero if successful.


Method read

int|string read(void|int length)

Description

Reads data from the file. If no argument is given, the whole file is read.


Method read_function

function(:string) read_function(int nbytes)

Description

Returns a function that when called will call read with nbytes as argument. Can be used to get various callback functions, eg for the fourth argument to String.SplitIterator.

Class Gz._file

Description

Low-level implementation of read/write support for GZip files


Method close

int close()

Description

closes the file

Returns

1 if successful


Method create

Gz._file Gz._file(void|string|Stdio.Stream gzFile, void|string mode)

Description

Opens a gzip file for reading.


Method eof

bool eof()

Returns

1 if EOF has been reached.


Method open

int open(string|int|Stdio.Stream file, void|string mode)

Description

Opens a file for I/O.

Parameter file

The filename or an open filedescriptor or Stream for the GZip file to use.

Parameter mode

Mode for the file operations. Defaults to read only. The following mode characters are unique to Gz.File.

"0"

Values 0 to 9 set the compression level from no compression to maximum available compression. Defaults to 6.

"1"
"2"
"3"
"4"
"5"
"6"
"7"
"8"
"9"
"f"

Sets the compression strategy to FILTERED.

"h"

Sets the compression strategy to HUFFMAN_ONLY.

Note

If the object already has been opened, it will first be closed.


Method read

int|string read(int len)

Description

Reads len (uncompressed) bytes from the file. If read is unsuccessful, 0 is returned.


Method seek

int seek(int pos, void|int type)

Description

Seeks within the file.

Parameter pos

Position relative to the searchtype.

Parameter type

SEEK_SET = set current position in file to pos SEEK_CUR = new position is current+pos SEEK_END is not supported.

Returns

New position or negative number if seek failed.


Method setparams

int setparams(void|int(0..9) level, void|int strategy, void|int(8..15) window_size)

Description

Sets the encoding level, strategy and window_size.

See also

Gz.deflate


Method tell

int tell()

Returns

the current position within the file.


Method write

int write(string data)

Description

Writes the data to the file.

Returns

the number of bytes written to the file.

Class Gz.deflate

Description

This class interfaces with the compression routines in the libz library.

Note

This class is only available if libz was available and found when Pike was compiled.

See also

Gz.inflate(), Gz.compress(), Gz.uncompress()


Method clone

Gz.deflate clone()

Description

Clones the deflate object. Typically used to test compression of new content using the same exact state.


Method create

Gz.deflate Gz.deflate(int(-9..9)|void level, int|void strategy, int(8..15)|void window_size)
Gz.deflate Gz.deflate(mapping options)

Description

This function can also be used to re-initialize a Gz.deflate object so it can be re-used.

If a mapping is passed as the only argument, it will accept the parameters described below as indices, and additionally it accepts a string as dictionary.

Parameter level

Indicates the level of effort spent to make the data compress well. Zero means no packing, 2-3 is considered 'fast', 6 is default and higher is considered 'slow' but gives better packing.

If the argument is negative, no headers will be emitted. This is needed to produce ZIP-files, as an example. The negative value is then negated, and handled as a positive value.

Parameter strategy

The strategy to be used when compressing the data. One of the following.

DEFAULT_STRATEGY

The default strategy as selected in the zlib library.

FILTERED

This strategy is intented for data created by a filter or predictor and will put more emphasis on huffman encoding and less on LZ string matching. This is between DEFAULT_STRATEGY and HUFFMAN_ONLY.

RLE

This strategy is even closer to the HUFFMAN_ONLY in that it only looks at the latest byte in the window, i.e. a window size of 1 byte is sufficient for decompression. This mode is not available in all zlib versions.

HUFFMAN_ONLY

This strategy will turn of string matching completely, only doing huffman encoding. Window size doesn't matter in this mode and the data can be decompressed with a zero size window.

FIXED

In this mode dynamic huffman codes are disabled, allowing for a simpler decoder for special applications. This mode is not available in all zlib versions.

Parameter window_size

Defines the size of the LZ77 window from 256 bytes to 32768 bytes, expressed as 2^x.


Method deflate

string(8bit) deflate(string(8bit)|String.Buffer|System.Memory|Stdio.Buffer data, int|void flush)

Description

This function performs gzip style compression on a string data and returns the packed data. Streaming can be done by calling this function several times and concatenating the returned data.

The optional argument flush should be one of the following:

Gz.NO_FLUSH

Only data that doesn't fit in the internal buffers is returned.

Gz.PARTIAL_FLUSH

All input is packed and returned.

Gz.SYNC_FLUSH

All input is packed and returned.

Gz.FINISH

All input is packed and an 'end of data' marker is appended (default).

See also

Gz.inflate->inflate()

Class Gz.inflate

Description

This class interfaces with the uncompression routines in the libz library.

Note

This program is only available if libz was available and found when Pike was compiled.

See also

deflate, compress, uncompress


Method create

Gz.inflate Gz.inflate(int|void window_size)
Gz.inflate Gz.inflate(mapping options)

Description

If called with a mapping as only argument, create accepts the entries window_size (described below) and dictionary, which is a string to be set as dictionary.

The window_size value is passed down to inflateInit2 in zlib.

If the argument is negative, no header checks are done, and no verification of the data will be done either. This is needed for uncompressing ZIP-files, as an example. The negative value is then negated, and handled as a positive value.

Positive arguments set the maximum dictionary size to an exponent of 2, such that 8 (the minimum) will cause the window size to be 256, and 15 (the maximum, and default value) will cause it to be 32Kb. Setting this to anything except 15 is rather pointless in Pike.

It can be used to limit the amount of memory that is used to uncompress files, but 32Kb is not all that much in the great scheme of things.

To decompress files compressed with level 9 compression, a 32Kb window size is needed. level 1 compression only requires a 256 byte window.

If the options version is used you can specify your own dictionary in addition to the window size.

dictionary : string
window_size : int

Method end_of_stream

string(8bit) end_of_stream()

Description

This function returns 0 if the end of stream marker has not yet been encountered, or a string (possibly empty) containg any extra data received following the end of stream marker if the marker has been encountered. If the extra data is not needed, the result of this function can be treated as a logical value.


Method inflate

string(8bit) inflate(string(8bit)|String.Buffer|System.Memory|Stdio.Buffer data)

Description

This function performs gzip style decompression. It can inflate a whole file at once or in blocks.

Example

// whole file

write(Gz.inflate()->inflate(stdin->read(0x7fffffff));

// streaming (blocks)
function inflate=Gz.inflate()->inflate;
while(string s=stdin->read(8192))
  write(inflate(s));
See also

Gz.deflate->deflate(), Gz.uncompress

Module Bz2

Description

The Bz2 module contains functions to compress and uncompress strings using the same algorithm as the program bzip2. Compressing and decompressing can be done in streaming mode feeding the compress and decompress objects with arbitrarily large pieces of data.

The Bz2 module consists of three classes; Bz2.Deflate, Bz2.Inflate and Bz2.File. Bz2.Deflate is used to compress data and Bz2.Inflate is used to uncompress data. Bz2.File is used to handle Bzip2 files.

Note

Note that this module is only available if libbzip2 was available when Pike was compiled.

Note that although the functions in Inflate and Deflate use the same algorithm as bzip2, they do not use the exact same format, so you can not directly zip files or unzip zip-files using those functions. That is why there exists a third class for files.


Inherit Bz2

inherit "___Bz2" : Bz2

Class Bz2.Deflate

Description

Bz2.Deflate is a builtin program written in C. It interfaces the packing routines in the bzlib library.

Note

This program is only available if libz was available and found when Pike was compiled.

See also

Bz2.Inflate()


Method create

Bz2.Deflate Bz2.Deflate(int(1..9)|void block_size)

Description

If given, block_size should be a number from 1 to 9 indicating the block size used when doing compression. The actual block size will be a 100000 times this number. Low numbers are considered 'fast', higher numbers are considered 'slow' but give better packing. The parameter is set to 9 if it is omitted.

This function can also be used to re-initialize a Bz2.Deflate object so it can be re-used.


Method deflate

string deflate(string data, int(0..2)|void flush_mode)

Description

This function performs bzip2 style compression on a string data and returns the packed data. Streaming can be done by calling this function several times and concatenating the returned data.

The optional argument flush_mode should be one of the following:

Bz2.BZ_RUN

Runs Bz2.Deflate->feed()

Bz2.BZ_FLUSH

Runs Bz2.Deflate->read()

Bz2.BZ_FINISH

Runs Bz2.Deflate->finish()

See also

Bz2.Inflate->inflate()


Method feed

void feed(string data)

Description

This function feeds the data to the internal buffers of the Deflate object. All data is buffered until a read or a finish is done.

See also

Bz2.Deflate->read() Bz2.Deflate->finish()


Method finish

string finish(string data)

Description

This method feeds the data to the internal buffers of the Deflate object. Then it compresses all buffered data adds a end of data marker ot it, returns the compressed data as a string, and reinitializes the deflate object.

See also

Bz2.Deflate->feed() Bz2.Deflate->read()


Method read

string read(string data)

Description

This function feeds the data to the internal buffers of the Deflate object. Then it compresses all buffered data and returns the compressed data as a string

See also

Bz2.Deflate->feed() Bz2.Deflate->finish()

Class Bz2.File

Description

Low-level implementation of read/write support for Bzip2 files

Note

This class is currently not available on Windows.


Inherit File

inherit Bz2::File : File


Method close

bool close()

Description

closes the file


Method create

Bz2.File Bz2.File()
Bz2.File Bz2.File(string filename, void|string mode)

Description

Creates a Bz2.File object


Method eof

bool eof()

Returns

1 if EOF has been reached, 0 otherwise


Method line_iterator

String.SplitIterator|Stdio.LineIterator line_iterator(int|void trim)

Description

Returns an iterator that will loop over the lines in this file. If trim is true, all '\r' characters will be removed from the input.


Method open

bool open(string file, void|string mode)

Description

Opens a file for I/O.

Parameter file

The name of the file to be opened

Parameter mode

Mode for the file operations. Can be either "r" (read) or "w". Read is default.


Method read

string read(int len)

Description

Reads len (uncompressed) bytes from the file. If len is omitted the whole file is read. If read is unsuccessful, 0 is returned.


Method read_function

function(:string) read_function(int nbytes)

Description

Returns a function that when called will call read with nbytes as argument. Can be used to get various callback functions, eg for the fourth argument to String.SplitIterator.


Method read_open

bool read_open(string file)

Description

Opens a file for reading.

Parameter file

The name of the file to be opened


Method write

int write(string data)

Description

Writes the data to the file.

Returns

the number of bytes written to the file.


Method write_open

bool write_open(string file)

Description

Opens a file for writing.

Parameter file

The name of the file to be opened

Class Bz2.Inflate

Description

Bz2.Inflate is a builtin program written in C. It interfaces the unpacking routines in the libz library.

Note

This program is only available if bzlib was available and found when Pike was compiled.

See also

Deflate


Method create

Bz2.Inflate Bz2.Inflate()


Method inflate

string inflate(string data)

Description

This function performs bzip2 style decompression. It can do decompression with arbitrarily large pieces of data. When fed with data, it decompresses as much as it can and buffers the rest.

Example

while(..){ foo = compressed_data[i..i+9]; uncompressed_concatenated_data += inflate_object->inflate(foo); i = i+10; }

See also

Bz2.Deflate->deflate()

Module HPack

Description

Implementation of the HPACK (RFC 7541) header packing standard.

This is the header packing system that is used in HTTP/2 (RFC 7540).


Inherit "___HPack"

inherit "___HPack" : "___HPack"


Constant DEFAULT_HEADER_TABLE_SIZE

constant int HPack.DEFAULT_HEADER_TABLE_SIZE

Description

This is the default static maximum size of the dynamic header table.

This constant is taken from RFC 7540 section 6.5.2.


Constant static_header_tab

constant HPack.static_header_tab

Description

Table of static headers. RFC 7541 appendix A, Table 1.

Array
array(string(8bit)) 0..60
Array
string(8bit) 0

Header name.

string(8bit) 1

Default value.

Note

Note that this table is indexed starting on 0 (zero), while the corresponding table in RFC 7541 starts on 1 (one).


Variable static_header_index

protected mapping(string(8bit):int|mapping(string(8bit):int)) HPack.static_header_index

Description

Index for static_header_tab.

Note

Note that the indices are offset by 1 (one).

Note

This variable should be regarded as a constant.

This variable is used to initialize the header index in the Context.

See also

static_header_tab, Context()->header_index


Method create_index

protected mapping(string(8bit):int|mapping(string(8bit):int)) create_index(array(array(string(8bit))) tab)

Description

Helper function used to create the static_header_index.


Method huffman_decode

string(8bit) huffman_decode(string(8bit) str)

Description

Decodes the string str encoded with the static huffman code specified in RFC 7541 appendix B.

Parameter str

String to decode.

Returns

Returns the decoded string.

See also

huffman_encode().


Method huffman_encode

string(8bit) huffman_encode(string(8bit) str)

Description

Encodes the string str with the static huffman code specified in RFC 7541 appendix B.

Parameter str

String to encode.

Returns

Returns the encoded string.

See also

huffman_decode().


Method update_index

protected void update_index(mapping(string(8bit):int|mapping(string(8bit):int)) index, int i, array(string(8bit)) key)

Description

Update the specified encoder lookup index.

Parameter index

Lookup index to add an entry to.

Parameter key

Lookup key to add.

Parameter i

Value to store in the index for the key.

Enum HPack.HPackFlags

Description

Flags for Context()->encode_header() et al.


Constant HEADER_INDEXED

constant HPack.HEADER_INDEXED

Description

Indexed header.


Constant HEADER_INDEXED_MASK

constant HPack.HEADER_INDEXED_MASK

Description

Bitmask for indexing mode.


Constant HEADER_NEVER_INDEXED

constant HPack.HEADER_NEVER_INDEXED

Description

Never indexed header.


Constant HEADER_NOT_INDEXED

constant HPack.HEADER_NOT_INDEXED

Description

Unindexed header.

Class HPack.Context

Description

Context for an HPack encoder or decoder.

This class implements the majority of RFC 7541.

Functions of interest are typically encode() and decode().


Variable dynamic_headers

protected array(array(string(8bit))) HPack.Context.dynamic_headers

Description

Table of currently available dynamically defined headers.

New entries are appended last, and the first dynamic_prefix elements are not used.

See also

header_index, add_header()


Variable dynamic_max_size

protected int HPack.Context.dynamic_max_size

Description

Current upper size limit in bytes for dynamic_headers.

See also

set_dynamic_size()


Variable dynamic_prefix

protected int HPack.Context.dynamic_prefix

Description

Index of first avaiable header in dynamic_headers.


Variable dynamic_size

protected int HPack.Context.dynamic_size

Description

Current size in bytes of dynamic_headers.


Variable header_index

protected mapping(string(8bit):int|mapping(string(8bit):int)) HPack.Context.header_index

Description

Index into dynamic_headers and static_headers.

"header_name" : mapping(string(8bit):int)|int

Indexed on the header name in lower-case. The value is one of:

int

Index value for the header value "".

mapping(string(8bit):int)
"header_value" : mapping(string(8bit):int)

Index value for the corresponding header value.

The index values in turn are coded as follows:

(1..)

Index into static_header_tab offset by 1.

0

Not used.

(..-1)

Inverted (`~()) index into dynamic_headers.

See also

dynamic_headers, static_header_tab, add_header()


Variable static_max_size

protected int HPack.Context.static_max_size

Description

Static upper size limit in bytes for dynamic_headers.

See also

create(), set_dynamic_size()


Method add_header

int(0)|int(62) add_header(string(8bit) header, string(8bit) value)

Description

Add a header to the table of known headers and to the header index.

Parameter header

Name of header to add.

Parameter value

Value of the header.

Returns

Returns 0 (zero) if the header was too large to store. Returns the encoding key for the header on success (this is always sizeof(static_header_tab + 1 (ie 62), as new headers are prepended to the dynamic header table.

Note

Adding a header may cause old headers to be evicted from the table.

See also

get_indexed_header()


Method create

HPack.Context HPack.Context(int|void protocol_dynamic_max_size)

Description

Create a new HPack Context.

Parameter static_max_size

This is the static maximum size in bytes (as calculated by RFC 7541 section 4.1) of the dynamic header table. It defaults to DEFAULT_HEADER_TABLE_SIZE, and is the upper limit for set_dynamic_size().

See also

set_dynamic_size()


Method decode

array(array(string(8bit)|HPackFlags)) decode(Stdio.Buffer buf)

Description

Decode a HPack header block.

Parameter buf

Input buffer.

Returns

Returns an array of headers. Cf decode_header().

See also

decode_header(), encode()


Method decode_header

array(string(8bit)|HPackFlags) decode_header(Stdio.Buffer buf)

Description

Decode a single HPack header.

Parameter buf

Input buffer.

Returns

Returns UNDEFINED on empty buffer. Returns an array with a header and value otherwise:

Array
string(8bit) 0

Name of the header. Under normal circumstances this is always lower-case, but no check is currently performed.

string(8bit) 1

Value of the header.

HPackFlags|void 2

Optional encoding flags. Only set for fields having HEADER_NEVER_INDEXED.

The elements in the array are in the same order and compatible with the arguments to encode_header().

Throws

Throws on encoding errors.

Note

The returned array MUST NOT be modified.

Note

In future implementations the result array may get extended with a flag field.

Note

The in-band signalling of encoding table sizes is handled internally.

See also

decode(), encode_header()


Method encode

void encode(array(array(string(8bit)|HPackFlags)) headers, Stdio.Buffer buf)

Description

Encode a full set of headers.

Parameter headers

An array of ({ header, value })-tuples.

Parameter buf

Output buffer.

See also

encode_header(), decode()


Method encode

variant string(8bit) encode(array(array(string(8bit))) headers)

Description

Convenience variant of encode().

Parameter headers

An array of ({ header, value })-tuples.

Returns

Returns the corresponding HPack encoding.


Method encode_header

void encode_header(Stdio.Buffer buf, string(8bit) header, string(8bit) value, HPackFlags|void flags)

Description

Encode a single HPack header.

Parameter buf

Output buffer.

Parameter header

Name of header. This should under normal circumstances be a lower-case string, but this is currently not checked.

Parameter value

Header value.

Parameter flags

Optional encoding flags.

See also

encode(), decode_header()


Method evict_dynamic_headers

protected void evict_dynamic_headers()

Description

Evict dynamic headers until dynamic_size goes below dynamic_max_size.


Method get_indexed_header

array(string(8bit)) get_indexed_header(int(1..) index)

Description

Lookup a known header.

Parameter index

Encoding key for the header to retrieve.

Returns

Returns UNDEFINED on unknown header. Returns an array with a header and value otherwise:

Array
string(8bit) 0

Name of the header. Under normal circumstances this is always lower-case, but no check is currently performed.

string(8bit) 1

Value of the header.

See also

add_header()


Method put_int

protected void put_int(Stdio.Buffer buf, int(8bit) bits, int(8bit) mask, int value)

Description

Encode an integer with the HPack integer encoding.

Parameter buf

Output buffer.

Parameter bits

Bits that should always be set in the first byte of output.

Parameter mask

Bitmask for the value part of the first byte of output.

Parameter value

Integer value to encode.


Method put_string

protected void put_string(Stdio.Buffer buf, string(8bit) str)

Description

Encode a string with the HPack string encoding.

Parameter buf

Output buffer.

Parameter str

String to output.

The encoder will huffman_encode() the string if that renders a shorter encoding than the verbatim string.


Method set_dynamic_size

void set_dynamic_size(Stdio.Buffer buf, int(0..) new_max_size)

Description

Set the dynamic maximum size of the dynamic header lookup table.

Parameter buf

Output buffer.

Parameter new_max_size

New dynamic maximum size in bytes (as calculated by RFC 7541 section 4.1).

Note

This function can be used to clear the dynamic header table by setting the size to zero.

Note

Also note that the new_max_size has an upper bound that is limited by static_max_size.

See also

encode_header(), encode(), create().