16. Compression
Module Gz
- Description
The Gz module contains functions to compress and uncompress strings using the same algorithm as the program gzip. Compressing can be done in streaming mode or all at once.
The Gz module consists of two classes; Gz.deflate and Gz.inflate. Gz.deflate is used to pack data and Gz.inflate is used to unpack data. (Think "inflatable boat")
- Note
Note that this module is only available if the gzip library was available when Pike was compiled.
Note that although these functions use the same algorithm as gzip, they do not use the exact same format, so you cannot directly unzip gzipped files with these routines. Support for this will be added in the future.
- Constant
DEFAULT_STRATEGY
constantGz.DEFAULT_STRATEGY- Description
The default strategy as selected in the zlib library.
- Constant
FILTERED
constantGz.FILTERED- Description
This strategy is intented for data created by a filter or predictor and will put more emphasis on huffman encoding and less on LZ string matching. This is between DEFAULT_STRATEGY and HUFFMAN_ONLY.
- Constant
FIXED
constantGz.FIXED- Description
In this mode dynamic huffman codes are disabled, allowing for a simpler decoder for special applications. This mode is not available in all zlib versions.
- Constant
HUFFMAN_ONLY
constantGz.HUFFMAN_ONLY- Description
This strategy will turn of string matching completely, only doing huffman encoding. Window size doesn't matter in this mode and the data can be decompressed with a zero size window.
- Constant
RLE
constantGz.RLE- Description
This strategy is even closer to the HUFFMAN_ONLY in that it only looks at the latest byte in the window, i.e. a window size of 1 byte is sufficient for decompression. This mode is not available in all zlib versions.
- Method
adler32
intadler32(string(8bit)data,void|int(0..)start_value)- Description
This function calculates the Adler-32 Cyclic Redundancy Check.
- Method
check_header
string(8bit)|zerocheck_header(Stdio.Stream|voidf,Stdio.Buffer|string(8bit)|voidbuf)- Description
Check whether a file has a valid gzip header.
- Parameter
f File to check.
- Parameter
buf Prefix of
f.- Returns
Returns the content of
fafter the gzip header if a header was found. Returns0(zero) if there was no header.
- Method
compress
string(8bit)compress(string(8bit)|String.Buffer|System.Memory|Stdio.Bufferdata,void|boolraw,void|int(0..9)level,void|intstrategy,void|int(8..15)window_size)- Description
Encodes and returns the input
dataaccording to the deflate format defined in RFC 1951.- Parameter
data The data to be encoded.
- Parameter
raw If set, the data is encoded without the header and footer defined in RFC 1950. Example of uses is the ZIP container format.
- Parameter
level Indicates the level of effort spent to make the data compress well. Zero means no packing, 2-3 is considered 'fast', 8 is default and higher is considered 'slow' but gives better packing.
- Parameter
strategy The strategy to be used when compressing the data. One of the following.
DEFAULT_STRATEGYThe default strategy as selected in the zlib library.
FILTEREDThis strategy is intented for data created by a filter or predictor and will put more emphasis on huffman encoding and less on LZ string matching. This is between DEFAULT_STRATEGY and HUFFMAN_ONLY.
RLEThis strategy is even closer to the HUFFMAN_ONLY in that it only looks at the latest byte in the window, i.e. a window size of 1 byte is sufficient for decompression. This mode is not available in all zlib versions.
HUFFMAN_ONLYThis strategy will turn of string matching completely, only doing huffman encoding. Window size doesn't matter in this mode and the data can be decompressed with a zero size window.
FIXEDIn this mode dynamic huffman codes are disabled, allowing for a simpler decoder for special applications. This mode is not available in all zlib versions.
- Parameter
window_size Defines the size of the LZ77 window from 256 bytes to 32768 bytes, expressed as 2^x.
- See also
deflate,inflate,uncompress
- Method
crc32
intcrc32(string(8bit)data,void|int(0..)start_value)- Description
This function calculates the standard ISO3309 Cyclic Redundancy Check.
- Method
make_header
intmake_header(Stdio.Stream|Stdio.Bufferf)- Description
Write a gzip header to a file or buffer.
- Parameter
f File or buffer to write a gzip header to.
- Returns
Returns
1on success and0(zero) on failure.
- Method
uncompress
string(8bit)uncompress(string(8bit)|String.Buffer|System.Memory|Stdio.Bufferdata,void|boolraw)- Description
Uncompresses the
dataand returns it. Therawparameter tells the decoder that the indata lacks the data header and footer defined in RFC 1950.
Class Gz.File
- Inherit
_file
inherit ._file : _file- Description
Allows the user to open a Gzip archive and read and write it's contents in an uncompressed form, emulating the
Stdio.Fileinterface.- Note
An important limitation on this class is that it may only be used for reading or writing, not both at the same time. Please also note that if you want to reopen a file for reading after a write, you must close the file before calling open or strange effects might be the result.
- Method
create
Gz.FileGz.File(void|string|int|Stdio.Streamfile,void|stringmode)- Parameter
file Filename or filedescriptor of the gzip file to open, or an already open Stream.
- Parameter
mode mode for the file. Defaults to "rb".
- See also
openStdio.File
- Method
line_iterator
String.SplitIterator|Stdio.LineIteratorline_iterator(int|voidtrim)- Description
Returns an iterator that will loop over the lines in this file. If trim is true, all '\r' characters will be removed from the input.
- Method
open
intopen(string|int|Stdio.Streamfile,void|stringmode)- Parameter
file Filename or filedescriptor of the gzip file to open, or an already open Stream.
- Parameter
mode mode for the file. Defaults to "rb". May be one of the following:
- rb
read mode
- wb
write mode
- ab
append mode
For the wb and ab mode, additional parameters may be specified. Please se zlib manual for more info.
- Returns
non-zero if successful.
- Method
read
int|stringread(void|intlength)- Description
Reads data from the file. If no argument is given, the whole file is read.
- Inherit
_file
Class Gz._file
- Description
Low-level implementation of read/write support for GZip files
- Method
create
Gz._fileGz._file(void|string|Stdio.StreamgzFile,void|stringmode)- Description
Opens a gzip file for reading.
- Method
open
intopen(string|int|Stdio.Streamfile,void|stringmode)- Description
Opens a file for I/O.
- Parameter
file The filename or an open filedescriptor or Stream for the GZip file to use.
- Parameter
mode Mode for the file operations. Defaults to read only. The following mode characters are unique to Gz.File.
"0"Values 0 to 9 set the compression level from no compression to maximum available compression. Defaults to 6.
"1""2""3""4""5""6""7""8""9""f"Sets the compression strategy to
FILTERED."h"Sets the compression strategy to
HUFFMAN_ONLY.- Note
If the object already has been opened, it will first be closed.
- Method
read
int|stringread(intlen)- Description
Reads len (uncompressed) bytes from the file. If read is unsuccessful, 0 is returned.
- Method
seek
intseek(intpos,void|inttype)- Description
Seeks within the file.
- Parameter
pos Position relative to the searchtype.
- Parameter
type SEEK_SET = set current position in file to pos SEEK_CUR = new position is current+pos SEEK_END is not supported.
- Returns
New position or negative number if seek failed.
- Method
setparams
intsetparams(void|int(0..9)level,void|intstrategy,void|int(8..15)window_size)- Description
Sets the encoding level, strategy and window_size.
- See also
Gz.deflate
Class Gz.deflate
- Description
This class interfaces with the compression routines in the libz library.
- Note
This class is only available if libz was available and found when Pike was compiled.
- See also
Gz.inflate(),Gz.compress(),Gz.uncompress()
- Method
clone
Gz.deflateclone()- Description
Clones the deflate object. Typically used to test compression of new content using the same exact state.
- Method
create
Gz.deflateGz.deflate(int(-9..9)|voidlevel,int|voidstrategy,int(8..15)|voidwindow_size)Gz.deflateGz.deflate(mappingoptions)- Description
This function can also be used to re-initialize a Gz.deflate object so it can be re-used.
If a mapping is passed as the only argument, it will accept the parameters described below as indices, and additionally it accepts a
stringasdictionary.- Parameter
level Indicates the level of effort spent to make the data compress well. Zero means no packing, 2-3 is considered 'fast', 6 is default and higher is considered 'slow' but gives better packing.
If the argument is negative, no headers will be emitted. This is needed to produce ZIP-files, as an example. The negative value is then negated, and handled as a positive value.
- Parameter
strategy The strategy to be used when compressing the data. One of the following.
DEFAULT_STRATEGYThe default strategy as selected in the zlib library.
FILTEREDThis strategy is intented for data created by a filter or predictor and will put more emphasis on huffman encoding and less on LZ string matching. This is between DEFAULT_STRATEGY and HUFFMAN_ONLY.
RLEThis strategy is even closer to the HUFFMAN_ONLY in that it only looks at the latest byte in the window, i.e. a window size of 1 byte is sufficient for decompression. This mode is not available in all zlib versions.
HUFFMAN_ONLYThis strategy will turn of string matching completely, only doing huffman encoding. Window size doesn't matter in this mode and the data can be decompressed with a zero size window.
FIXEDIn this mode dynamic huffman codes are disabled, allowing for a simpler decoder for special applications. This mode is not available in all zlib versions.
- Parameter
window_size Defines the size of the LZ77 window from 256 bytes to 32768 bytes, expressed as 2^x.
- Method
deflate
string(8bit)deflate(string(8bit)|String.Buffer|System.Memory|Stdio.Bufferdata,int|voidflush)- Description
This function performs gzip style compression on a string
dataand returns the packed data. Streaming can be done by calling this function several times and concatenating the returned data.The optional argument
flushshould be one of the following:Gz.NO_FLUSHOnly data that doesn't fit in the internal buffers is returned.
Gz.PARTIAL_FLUSHAll input is packed and returned.
Gz.SYNC_FLUSHAll input is packed and returned.
Gz.FINISHAll input is packed and an 'end of data' marker is appended (default).
- See also
Gz.inflate->inflate()
Class Gz.inflate
- Description
This class interfaces with the uncompression routines in the libz library.
- Note
This program is only available if libz was available and found when Pike was compiled.
- See also
deflate,compress,uncompress
- Method
create
Gz.inflateGz.inflate(int|voidwindow_size)Gz.inflateGz.inflate(mappingoptions)- Description
If called with a mapping as only argument,
createaccepts the entrieswindow_size(described below) anddictionary, which is a string to be set as dictionary.The window_size value is passed down to inflateInit2 in zlib.
If the argument is negative, no header checks are done, and no verification of the data will be done either. This is needed for uncompressing ZIP-files, as an example. The negative value is then negated, and handled as a positive value.
Positive arguments set the maximum dictionary size to an exponent of 2, such that 8 (the minimum) will cause the window size to be 256, and 15 (the maximum, and default value) will cause it to be 32Kb. Setting this to anything except 15 is rather pointless in Pike.
It can be used to limit the amount of memory that is used to uncompress files, but 32Kb is not all that much in the great scheme of things.
To decompress files compressed with level 9 compression, a 32Kb window size is needed. level 1 compression only requires a 256 byte window.
If the
optionsversion is used you can specify your own dictionary in addition to the window size.dictionary:stringwindow_size:int
- Method
end_of_stream
string(8bit)end_of_stream()- Description
This function returns 0 if the end of stream marker has not yet been encountered, or a string (possibly empty) containg any extra data received following the end of stream marker if the marker has been encountered. If the extra data is not needed, the result of this function can be treated as a logical value.
- Method
inflate
string(8bit)inflate(string(8bit)|String.Buffer|System.Memory|Stdio.Bufferdata)- Description
This function performs gzip style decompression. It can inflate a whole file at once or in blocks.
- Example
// whole file
write(Gz.inflate()->inflate(stdin->read(0x7fffffff));// streaming (blocks)function inflate=Gz.inflate()->inflate;while(string s=stdin->read(8192)) write(inflate(s));- See also
Gz.deflate->deflate(),Gz.uncompress
Module Bz2
- Description
The Bz2 module contains functions to compress and uncompress strings using the same algorithm as the program bzip2. Compressing and decompressing can be done in streaming mode feeding the compress and decompress objects with arbitrarily large pieces of data.
The Bz2 module consists of three classes;
Bz2.Deflate,Bz2.InflateandBz2.File.Bz2.Deflateis used to compress data andBz2.Inflateis used to uncompress data.Bz2.Fileis used to handle Bzip2 files.- Note
Note that this module is only available if libbzip2 was available when Pike was compiled.
Note that although the functions in
InflateandDeflateuse the same algorithm as bzip2, they do not use the exact same format, so you can not directly zip files or unzip zip-files using those functions. That is why there exists a third class for files.
Class Bz2.Deflate
- Description
Bz2.Deflate is a builtin program written in C. It interfaces the packing routines in the bzlib library.
- Note
This program is only available if libz was available and found when Pike was compiled.
- See also
Bz2.Inflate()
- Method
create
Bz2.DeflateBz2.Deflate(int(1..9)|voidblock_size)- Description
If given,
block_sizeshould be a number from 1 to 9 indicating the block size used when doing compression. The actual block size will be a 100000 times this number. Low numbers are considered 'fast', higher numbers are considered 'slow' but give better packing. The parameter is set to 9 if it is omitted.This function can also be used to re-initialize a Bz2.Deflate object so it can be re-used.
- Method
deflate
stringdeflate(stringdata,int(0..2)|voidflush_mode)- Description
This function performs bzip2 style compression on a string
dataand returns the packed data. Streaming can be done by calling this function several times and concatenating the returned data.The optional argument
flush_modeshould be one of the following:Bz2.BZ_RUNRuns Bz2.Deflate->feed()
Bz2.BZ_FLUSHRuns Bz2.Deflate->read()
Bz2.BZ_FINISHRuns Bz2.Deflate->finish()
- See also
Bz2.Inflate->inflate()
- Method
feed
voidfeed(stringdata)- Description
This function feeds the data to the internal buffers of the Deflate object. All data is buffered until a read or a finish is done.
- See also
Bz2.Deflate->read()Bz2.Deflate->finish()
- Method
finish
stringfinish(stringdata)- Description
This method feeds the data to the internal buffers of the Deflate object. Then it compresses all buffered data adds a end of data marker ot it, returns the compressed data as a string, and reinitializes the deflate object.
- See also
Bz2.Deflate->feed()Bz2.Deflate->read()
Class Bz2.File
- Description
Low-level implementation of read/write support for Bzip2 files
- Note
This class is currently not available on Windows.
- Method
create
Bz2.FileBz2.File()Bz2.FileBz2.File(stringfilename,void|stringmode)- Description
Creates a Bz2.File object
- Method
line_iterator
String.SplitIterator|Stdio.LineIteratorline_iterator(int|voidtrim)- Description
Returns an iterator that will loop over the lines in this file. If trim is true, all '\r' characters will be removed from the input.
- Method
open
boolopen(stringfile,void|stringmode)- Description
Opens a file for I/O.
- Parameter
file The name of the file to be opened
- Parameter
mode Mode for the file operations. Can be either "r" (read) or "w". Read is default.
- Method
read
stringread(intlen)- Description
Reads len (uncompressed) bytes from the file. If len is omitted the whole file is read. If read is unsuccessful, 0 is returned.
- Method
read_function
function(:string) read_function(intnbytes)- Description
Returns a function that when called will call
readwith nbytes as argument. Can be used to get various callback functions, eg for the fourth argument toString.SplitIterator.
- Method
read_open
boolread_open(stringfile)- Description
Opens a file for reading.
- Parameter
file The name of the file to be opened
- Method
write
intwrite(stringdata)- Description
Writes the data to the file.
- Returns
the number of bytes written to the file.
Class Bz2.Inflate
- Description
Bz2.Inflate is a builtin program written in C. It interfaces the unpacking routines in the libz library.
- Note
This program is only available if bzlib was available and found when Pike was compiled.
- See also
Deflate
- Method
inflate
stringinflate(stringdata)- Description
This function performs bzip2 style decompression. It can do decompression with arbitrarily large pieces of data. When fed with data, it decompresses as much as it can and buffers the rest.
- Example
while(..){ foo = compressed_data[i..i+9]; uncompressed_concatenated_data += inflate_object->inflate(foo); i = i+10; }
- See also
Bz2.Deflate->deflate()
Module HPack
- Description
Implementation of the HPACK (RFC 7541) header packing standard.
This is the header packing system that is used in HTTP/2 (RFC 7540).
- Constant
DEFAULT_HEADER_TABLE_SIZE
constantintHPack.DEFAULT_HEADER_TABLE_SIZE- Description
This is the default static maximum size of the dynamic header table.
This constant is taken from RFC 7540 section 6.5.2.
- Constant
static_header_tab
constantHPack.static_header_tab- Description
Table of static headers. RFC 7541 appendix A, Table 1.
Array array(string(8bit))0..60Array string(8bit)0Header name.
string(8bit)1Default value.
- Note
Note that this table is indexed starting on
0(zero), while the corresponding table in RFC 7541 starts on1(one).
- Variable
static_header_index
protectedmapping(string(8bit):int|mapping(string(8bit):int)) HPack.static_header_index- Description
Index for
static_header_tab.- Note
Note that the indices are offset by
1(one).- Note
This variable should be regarded as a constant.
This variable is used to initialize the header index in the
Context.- See also
static_header_tab,Context()->header_index
- Method
create_index
protectedmapping(string(8bit):int|mapping(string(8bit):int)) create_index(array(array(string(8bit)))tab)- Description
Helper function used to create the
static_header_index.
- Method
huffman_decode
string(8bit)huffman_decode(string(8bit)str)- Description
Decodes the string
strencoded with the static huffman code specified in RFC 7541 appendix B.- Parameter
str String to decode.
- Returns
Returns the decoded string.
- See also
huffman_encode().
- Method
huffman_encode
string(8bit)huffman_encode(string(8bit)str)- Description
Encodes the string
strwith the static huffman code specified in RFC 7541 appendix B.- Parameter
str String to encode.
- Returns
Returns the encoded string.
- See also
huffman_decode().
- Method
update_index
protectedvoidupdate_index(mapping(string(8bit):int|mapping(string(8bit):int))index,inti,array(string(8bit))key)- Description
Update the specified encoder lookup index.
- Parameter
index Lookup index to add an entry to.
- Parameter
key Lookup key to add.
- Parameter
i Value to store in the index for the key.
Enum HPack.HPackFlags
- Description
Flags for
Context()->encode_header()et al.
Class HPack.Context
- Description
Context for an HPack encoder or decoder.
This class implements the majority of RFC 7541.
Functions of interest are typically
encode()anddecode().
- Variable
dynamic_headers
protectedarray(array(string(8bit))) HPack.Context.dynamic_headers- Description
Table of currently available dynamically defined headers.
New entries are appended last, and the first
dynamic_prefixelements are not used.- See also
header_index,add_header()
- Variable
dynamic_max_size
protectedintHPack.Context.dynamic_max_size- Description
Current upper size limit in bytes for
dynamic_headers.- See also
set_dynamic_size()
- Variable
dynamic_prefix
protectedintHPack.Context.dynamic_prefix- Description
Index of first avaiable header in
dynamic_headers.
- Variable
dynamic_size
protectedintHPack.Context.dynamic_size- Description
Current size in bytes of
dynamic_headers.
- Variable
header_index
protectedmapping(string(8bit):int|mapping(string(8bit):int)) HPack.Context.header_index- Description
Index into
dynamic_headersandstatic_headers."header_name":mapping(string(8bit):int)|intIndexed on the header name in lower-case. The value is one of:
intIndex value for the header value
"".mapping(string(8bit):int)"header_value":mapping(string(8bit):int)Index value for the corresponding header value.
The index values in turn are coded as follows:
(1..)Index into
static_header_taboffset by1.0Not used.
(..-1)Inverted (
`~()) index intodynamic_headers.- See also
dynamic_headers,static_header_tab,add_header()
- Variable
static_max_size
protectedintHPack.Context.static_max_size- Description
Static upper size limit in bytes for
dynamic_headers.- See also
create(),set_dynamic_size()
- Method
add_header
int(0)|int(62)add_header(string(8bit)header,string(8bit)value)- Description
Add a header to the table of known headers and to the header index.
- Parameter
header Name of header to add.
- Parameter
value Value of the header.
- Returns
Returns
0(zero) if the header was too large to store. Returns the encoding key for the header on success (this is alwayssizeof(static_header_tab + 1(ie62), as new headers are prepended to the dynamic header table.- Note
Adding a header may cause old headers to be evicted from the table.
- See also
get_indexed_header()
- Method
create
HPack.ContextHPack.Context(int|voidprotocol_dynamic_max_size)- Description
Create a new HPack
Context.- Parameter
static_max_size This is the static maximum size in bytes (as calculated by RFC 7541 section 4.1) of the dynamic header table. It defaults to
DEFAULT_HEADER_TABLE_SIZE, and is the upper limit forset_dynamic_size().- See also
set_dynamic_size()
- Method
decode
array(array(string(8bit)|HPackFlags)) decode(Stdio.Bufferbuf)- Description
Decode a HPack header block.
- Parameter
buf Input buffer.
- Returns
Returns an array of headers. Cf
decode_header().- See also
decode_header(),encode()
- Method
decode_header
array(string(8bit)|HPackFlags) decode_header(Stdio.Bufferbuf)- Description
Decode a single HPack header.
- Parameter
buf Input buffer.
- Returns
Returns
UNDEFINEDon empty buffer. Returns an array with a header and value otherwise:Array string(8bit)0Name of the header. Under normal circumstances this is always lower-case, but no check is currently performed.
string(8bit)1Value of the header.
HPackFlags|void2Optional encoding flags. Only set for fields having
HEADER_NEVER_INDEXED.The elements in the array are in the same order and compatible with the arguments to
encode_header().- Throws
Throws on encoding errors.
- Note
The returned array MUST NOT be modified.
- Note
In future implementations the result array may get extended with a flag field.
- Note
The in-band signalling of encoding table sizes is handled internally.
- See also
decode(),encode_header()
- Method
encode
voidencode(array(array(string(8bit)|HPackFlags))headers,Stdio.Bufferbuf)- Description
Encode a full set of headers.
- Parameter
headers An array of ({ header, value })-tuples.
- Parameter
buf Output buffer.
- See also
encode_header(),decode()
- Method
encode
variantstring(8bit)encode(array(array(string(8bit)))headers)- Description
Convenience variant of
encode().- Parameter
headers An array of ({ header, value })-tuples.
- Returns
Returns the corresponding HPack encoding.
- Method
encode_header
voidencode_header(Stdio.Bufferbuf,string(8bit)header,string(8bit)value,HPackFlags|voidflags)- Description
Encode a single HPack header.
- Parameter
buf Output buffer.
- Parameter
header Name of header. This should under normal circumstances be a lower-case string, but this is currently not checked.
- Parameter
value Header value.
- Parameter
flags Optional encoding flags.
- See also
encode(),decode_header()
- Method
evict_dynamic_headers
protectedvoidevict_dynamic_headers()- Description
Evict dynamic headers until
dynamic_sizegoes belowdynamic_max_size.
- Method
get_indexed_header
array(string(8bit)) get_indexed_header(int(1..)index)- Description
Lookup a known header.
- Parameter
index Encoding key for the header to retrieve.
- Returns
Returns
UNDEFINEDon unknown header. Returns an array with a header and value otherwise:Array string(8bit)0Name of the header. Under normal circumstances this is always lower-case, but no check is currently performed.
string(8bit)1Value of the header.
- See also
add_header()
- Method
put_int
protectedvoidput_int(Stdio.Bufferbuf,int(8bit)bits,int(8bit)mask,intvalue)- Description
Encode an integer with the HPack integer encoding.
- Parameter
buf Output buffer.
- Parameter
bits Bits that should always be set in the first byte of output.
- Parameter
mask Bitmask for the value part of the first byte of output.
- Parameter
value Integer value to encode.
- Method
put_string
protectedvoidput_string(Stdio.Bufferbuf,string(8bit)str)- Description
Encode a string with the HPack string encoding.
- Parameter
buf Output buffer.
- Parameter
str String to output.
The encoder will
huffman_encode()the string if that renders a shorter encoding than the verbatim string.
- Method
set_dynamic_size
voidset_dynamic_size(Stdio.Bufferbuf,int(0..)new_max_size)- Description
Set the dynamic maximum size of the dynamic header lookup table.
- Parameter
buf Output buffer.
- Parameter
new_max_size New dynamic maximum size in bytes (as calculated by RFC 7541 section 4.1).
- Note
This function can be used to clear the dynamic header table by setting the size to zero.
- Note
Also note that the
new_max_sizehas an upper bound that is limited bystatic_max_size.- See also
encode_header(),encode(),create().