In this chapter we will discuss all the different ways to store data in Pike in detail. We have seen examples of many of these, but we haven't really gone into how they work. In this chapter we will also see which operators and functions work with the different types.
Types in Pike are used in two different contexts; during compile-time, and during run-time. Some types are only used during compile-time (void, mixed and all constructed types), all other types are also used during run-time. Also note the following functions and special forms:
typeof
(mixed x)_typeof
(mixed x)There are two categories of run-time data types in Pike: basic types, and pointer types. The difference is that basic types are copied when assigned to a variable. With pointer types, merely the pointer is copied, that way you get two variables pointing to the same thing.
The basic types are int, float and string. For you who are accustomed to C or C++, it may seem odd that a string is a basic type as opposed to an array of char, but it is surprisingly easy to get used to.
Int is short for integer, or integer number. They are normally 32 bit integers, which means that they are in the range -2147483648 to 2147483647. (Note that on some machines an int might be larger than 32 bits.) If Pike is compiled with bignum support the 32 bit limitation does not apply and thus the integers can be of arbitrary size. Since they are integers, no decimals are allowed. An integer constant can be written in several ways:
Pattern | Example | Description |
-?[1-9][0-9]* | 78 | Decimal number |
-?0[0-9]* | 0116 | Octal number |
-?0[xX][0-9a-fA-F]+ | 0x4e | Hexadecimal number |
-?0[bB][01]+ | 0b1001110 | Binary number |
-?'\\?.' | 'N' | ASCII character |
All of the above represent the number 78. Octal notation means that
each digit is worth 8 times as much as the one after. Hexadecimal notation
means that each digit is worth 16 times as much as the one after.
Hexadecimal notation uses the letters a, b, c, d, e and f to represent the
numbers 10, 11, 12, 13, 14 and 15. In binary notation every digit is worth
twice the value of the succeding digit, but only 1:s and 0:s are used. The
ASCII notation gives the ASCII value of the character between the single
quotes. In this case the character is N which just happens to be
78 in ASCII. Some characters, like special characters as newlines, can not
be placed within single quotes. The special generation sequence for those
characters, listed under strings, must be used instead. Specifically this
applies to the single quote character itself, which has to be written as
'\''
.
When pike is compiled with bignum support integers in never overflow or underflow when they reach the system-defined maxint/minint. Instead they are silently converted into bignums. Integers are usually implemented as 2-complement 32-bits integers, and thus are limited within -2147483648 and 2147483647. This may however vary between platforms, especially 64-bit platforms. FIXME: Conversion back to normal integer?
All the arithmetic, bitwise and comparison operators can be used on integers. Also note these functions:
intp
(mixed x)random
(int x)reverse
(int x)sqrt
(int x)Although most programs only use integers, they are unpractical when doing
trigonometric calculations, transformations or anything else where you
need decimals. For this purpose you use float
. Floats are
normally 32 bit floating point numbers, which means that they can represent
very large and very small numbers, but only with 9 accurate digits. To write
a floating point constant, you just put in the decimals or write it in the
exponential form:
Pattern | Example | Equals |
-?[0-9]*\.[0-9]+ | 3.1415926 | 3.1415926 |
-?[0-9]+e-?[0-9]+ | -5e3 | -5000.0 |
-?[0-9]*\.[0-9]+e-?[0-9]+ | .22e-2 | 0.0022 |
Of course you can have any number of decimals to increase the accuracy.
Usually digits after the ninth digit are ignored, but on some architectures
float
might have higher accuracy than that. In the exponential
form, e
means "times 10 to the power of", so 1.0e9
is equal to "1.0 times 10 to the power of 9". FIXME: float and int is not
compatible and no implicit cast like in C++
All the arithmetic and comparison operators can be used on floats. Also, these functions operates on floats:
sin
, asin
,
cos
, acos
, tan
and atan
.
If you do not know what these functions do you probably don't
need them. Asin, acos and atan are of course short for
arc sine, arc cosine and arc tangent. On a calculator they
are often known as inverse sine, inverse cosine and
inverse tangent.log
(float x)exp
(float x)pow
(float|int x, float|int y)sqrt
(float x)floor
(float x)ceil
(float x)round
(float x)A string can be seen as an array of values from 0 to 2³²-1. Usually a string contains text such as a word, a sentence, a page or even a whole book. But it can also contain parts of a binary file, compressed data or other binary data. Strings in Pike are shared, which means that identical strings share the same memory space. This reduces memory usage very much for most applications and also speeds up string comparisons. We have already seen how to write a constant string:
"hello world" // hello world "he" "llo" // hello "\116" // N (116 is the octal ASCII value for N) "\t" // A tab character "\n" // A newline character "\r" // A carriage return character "\b" // A backspace character "\0" // A null character "\"" // A double quote character "\\" // A singe backslash "\x4e" // N (4e is the hexadecimal ASCII value for N) "\d78" // N (78 is the decimal ACII value for N) "hello world\116\t\n\r\b\0\"\\" // All of the above "\xff" // the character 255 "\xffff" // the character 65536 "\xffffff" // the character 16777215 "\116""3" // 'N' followed by a '3'
Pattern | Example |
. | N |
\\[0-7]+ | \116 |
\\x[0-9a-fA-F]+ | \x4e |
\\d[0-9]+ | \d78 |
\\u[0-9a-fA-F]+ (4) | \u004E |
\\U[0-9a-fA-F]+ (8) | \U0000004e |
Sequence | ASCII code | Charcter |
\a | 7 | An acknowledge character |
\b | 8 | A backspace character |
\t | 9 | A tab character |
\n | 10 | A newline character |
\v | 11 | A vertical tab character |
\f | 12 | A form feed character |
\r | 13 | A carriage return character |
\" | 34 | A double quote character |
\\ | 92 | A backslash character |
As you can see, any sequence of characters within double quotes is a string. The backslash character is used to escape characters that are not allowed or impossible to type. As you can see, \t is the sequence to produce a tab character, \\ is used when you want one backslash and \" is used when you want a double quote (") to be a part of the string instead of ending it. Also, \XXX where XXX is an octal number from 0 to 37777777777 or \xXX where XX is 0 to ffffffff lets you write any character you want in the string, even null characters. From version 0.6.105, you may also use \dXXX where XXX is 0 to 2³²-1. If you write two constant strings after each other, they will be concatenated into one string.
You might be surprised to see that individual characters can have values up to 2³²-1 and wonder how much memory that use. Do not worry, Pike automatically decides the proper amount of memory for a string, so all strings with character values in the range 0-255 will be stored with one byte per character. You should also beware that not all functions can handle strings which are not stored as one byte per character, so there are some limits to when this feature can be used.
Although strings are a form of arrays, they are immutable. This means that there is no way to change an individual character within a string without creating a new string. This may seem strange, but keep in mind that strings are shared, so if you would change a character in the string "foo", you would change *all* "foo" everywhere in the program.
However, the Pike compiler will allow you to to write code like you could change characters within strings, the following code is valid and works:
string s="hello torld"; s[6]='w';
However, you should be aware that this does in fact create a new string and
it may need to copy the string s to do so. This means that the above
operation can be quite slow for large strings. You have been warned.
Most of the time, you can use replace
, sscanf
,
`/
or some other high-level string operation to avoid having to use the above
construction too much.
All the comparison operators plus the operators listed here can be used on strings:
Also, these functions operates on strings:
String.capitalize
(string s)String.count
(string haystack, string needle)sizeof
(haystack/needle)-1.String.width
(string s)lower_case
(string s)replace
(string s, string from, string to)reverse
(string s)search
(string haystack, string needle)sizeof
(string s)strlen
(s),
returns the length of the string.stringp
(mixed s)strlen
(string s)upper_case
(string s)The basic types are, as the name implies, very basic. They are the foundation, most of the pointer types are merely interesting ways to store the basic types. The pointer types are array, mapping, multiset, program, object and function. They are all pointers which means that they point to something in memory. This "something" is freed when there are no more pointers to it. Assigning a variable with a value of a pointer type will not copy this "something" instead it will only generate a new reference to it. Special care sometimes has to be taken when giving one of these types as arguments to a function; the function can in fact modify the "something". If this effect is not wanted you have to explicitly copy the value. More about this will be explained later in this chapter.
Arrays are the simplest of the pointer types. An array is merely a block of memory with a fixed size containing a number of slots which can hold any type of value. These slots are called elements and are accessible through the index operator. To write a constant array you enclose the values you want in the array with ({ }) like this:
({ }) // Empty array ({ 1 }) // Array containing one element of type int ({ "" }) // Array containing a string ({ "", 1, 3.0 }) // Array of three elements, each of different type
As you can see, each element in the array can contain any type of value. Indexing and ranges on arrays works just like on strings, except with arrays you can change values inside the array with the index operator. However, there is no way to change the size of the array, so if you want to append values to the end you still have to add it to another array which creates a new array. Figure 4.1 shows how the schematics of an array. As you can see, it is a very simple memory structure.
Operators and functions usable with arrays:
aggregate
(mixed ... elems)allocate
(int size)arrayp
(mixed a)column
(array(mixed) a, mixed ind)equal
(mixed a, mixed b)filter
(array a, mixed func, mixed ... args)filter
for
details about that.)map
(array a, mixed func, mixed ... args)filter
but returns the
results of the function func instead of returning the
elements from a for which func returns true.
(Like filter
, this function accepts other things for
a and func; see the reference for map
.)replace
(array a, mixed from, mixed to)reverse
(array a)rows
(array a, array indexes)column
. It indexes a with
each element from indexes and returns the results in an array.
For example: rows( ({"a","b","c"}), ({ 2,1,2,0}) ) will return
({"c","b","c","a"}).search
(array haystack, mixed needle)sizeof
(mixed arr)sort
(array arr, array ... rest)Array.uniq
(array a)Mappings are are really just more generic arrays. However, they are slower and use more memory than arrays, so they cannot replace arrays completely. What makes mappings special is that they can be indexed on other things than integers. We can imagine that a mapping looks like this:
Each index-value pair is floating around freely inside the mapping. There is exactly one value for each index. We also have a (magical) lookup function. This lookup function can find any index in the mapping very quickly. Now, if the mapping is called m and we index it like this: m [ i ] the lookup function will quickly find the index i in the mapping and return the corresponding value. If the index is not found, zero is returned instead. If we on the other hand assign an index in the mapping the value will instead be overwritten with the new value. If the index is not found when assigning, a new index-value pair will be added to the mapping. Writing a constant mapping is easy:
([ ]) // Empty mapping ([ 1:2 ]) // Mapping with one index-value pair, the 1 is the index ([ "one":1, "two":2 ]) // Mapping which maps words to numbers ([ 1:({2.0}), "":([]), ]) // Mapping with lots of different types
As with arrays, mappings can contain any type. The main difference is that the index can be any type too. Also note that the index-value pairs in a mapping are not stored in a specific order. You can not refer to the fourteenth key-index pair, since there is no way of telling which one is the fourteenth. Because of this, you cannot use the range operator on mappings.
The following operators and functions are important:
indices
(mapping m)m_delete
(mapping m, mixed ind)mappingp
(mixed m)mkmapping
(array ind, array val)replace
(mapping m, mixed from, mixed to)search
(mapping m, mixed val)sizeof
(mapping m)values
(mapping m)indices
, but returns an array with all the values instead.
If indices
and values
are called on the same mapping after each other, without
any other mapping operations in between, the returned arrays will be in the same order. They can
in turn be used as arguments to mkmapping
to rebuild the mapping m again.zero_type
(mixed t)zero_type
will return
something else than 1.A multiset is almost the same thing as a mapping. The difference is that there are no values:
Instead, the index operator will return 1 if the value was found in the multiset and 0 if it was not. When assigning an index to a multiset like this: mset[ ind ] = val the index ind will be added to the multiset mset if val is true. Otherwise ind will be removed from the multiset instead.
Writing a constant multiset is similar to writing an array:
(< >) // Empty multiset (< 17 >) // Multiset with one index: 17 (< "", 1, 3.0, 1 >) // Multiset with four indices
Note that you can actually have more than one of the same index in a multiset. This is normally not used, but can be practical at times.
Normally, when we say program we mean something we can execute from a shell prompt. However, Pike has another meaning for the same word. In Pike a program is the same as a class in C++. A program holds a table of what functions and variables are defined in that program. It also holds the code itself, debug information and references to other programs in the form of inherits. A program does not hold space to store any data however. All the information in a program is gathered when a file or string is run through the Pike compiler. The variable space needed to execute the code in the program is stored in an object which is the next data type we will discuss.
Writing a program is easy, in fact, every example we have tried so far has been a program. To load such a program into memory, we can use compile_file which takes a file name, compiles the file and returns the compiled program. It could look something like this:
program p = compile_file("hello_world.pike");
You can also use the cast operator like this:
program p = (program) "hello_world";
This will also load the program hello_world.pike, the only difference is that it will cache the result so that next time you do (program)"hello_world" you will receive the _same_ program. If you call compile_file("hello_world.pike") repeatedly you will get a new program each time.
There is also a way to write programs inside programs with the help of the class keyword:
class class_name { inherits, variables and functions }
The class keyword can be written as a separate entity outside of all functions, but it is also an expression which returns the program written between the brackets. The class_name is optional. If used you can later refer to that program by the name class_name. This is very similar to how classes are written in C++ and can be used in much the same way. It can also be used to create structs (or records if you program Pascal). Let's look at an example:
class record { string title; string artist; array(string) songs; } array(record) records = ({}); void add_empty_record() { records+=({ record() }); } void show_record(record rec) { write("Record name: "+rec->title+"\n"); write("Artist: "+rec->artist+"\n"); write("Songs:\n"); foreach(rec->songs, string song) write(" "+song+"\n"); }
This could be a small part of a better record register program. It is not a complete executable program in itself. In this example we create a program called record which has three identifiers. In add_empty_record a new object is created by calling record. This is called cloning and it allocates space to store the variables defined in the class record. Show_record takes one of the records created in add_empty_record and shows the contents of it. As you can see, the arrow operator is used to access the data allocated in add_empty_record. If you do not understand this section I suggest you go on and read the next section about objects and then come back and read this section again.
program compile
(string p);
program compile_file
(string filename);
program compile_string
(string p, string filename);
compile_file
simply reads the file given as argument, compiles
it and returns the resulting program. compile_string
instead
compiles whatever is in the string p. The second argument,
filename, is only used in debug printouts when an error occurs
in the newly made program. Both compile_file
and
compile_string
call compile
to actually compile
the string after having called cpp
on it.programp
(mixed p)The following operators and functions are important:
indices
(program p)values
(program p)Although programs are absolutely necessary for any application you might want to write, they are not enough. A program doesn't have anywhere to store data, it just merely outlines how to store data. To actually store the data you need an object. Objects are basically a chunk of memory with a reference to the program from which it was cloned. Many objects can be made from one program. The program outlines where in the object different variables are stored.
Each object has its own set of variables, and when calling a function in that object, that function will operate on those variables. If we take a look at the short example in the section about programs, we see that it would be better to write it like this:
class record { string title; string artist; array(string) songs; void show() { write("Record name: "+title+"\n"); write("Artist: "+artist+"\n"); write("Songs:\n"); foreach(songs, string song) write(" "+song+"\n"); } } array(record) records = ({}); void add_empty_record() { records+=({ record() }); } void show_record(object rec) { rec->show(); }
Here we can clearly see how the function show prints the contents of the variables in that object. In essence, instead of accessing the data in the object with the -> operator, we call a function in the object and have it write the information itself. This type of programming is very flexible, since we can later change how record stores its data, but we do not have to change anything outside of the record program.
Functions and operators relevant to objects:
destruct
(object o)indices
(object o)object_program
(object o)objectp
(mixed o)this_object
()values
(object o)When indexing an object on a string, and that string is the name of a function in the object a function is returned. Despite its name, a function is really a function pointer.
When the function pointer is called, the interpreter sets
this_object()
to the object in which the function is located and
proceeds to execute the function it points to. Also note that function pointers
can be passed around just like any other data type:
int foo() { return 1; } function bar() { return foo; } int gazonk() { return foo(); } int teleledningsanka() { return bar()(); }
In this example, the function bar returns a pointer to the function foo. No indexing is necessary since the function foo is located in the same object. The function gazonk simply calls foo. However, note that the word foo in that function is an expression returning a function pointer that is then called. To further illustrate this, foo has been replaced by bar() in the function teleledningsanka.
For convenience, there is also a simple way to write a function inside another function. To do this you use the lambda keyword. The syntax is the same as for a normal function, except you write lambda instead of the function name:
lambda ( types ) { statements }
The major difference is that this is an expression that can be used inside an other function. Example:
function bar() { return lambda() { return 1; }; )
This is the same as the first two lines in the previous example, the keyword lambda allows you to write the function inside bar.
Note that unlike C++ and Java you can not use function overloading in Pike. This means that you cannot have one function called 'foo' which takes an integer argument and another function 'foo' which takes a float argument.
This is what you can do with a function pointer.
function_name
(function f)function_object
(function f)functionp
(mixed f)this_function
()There are two types that are pure compile-time types:
The type void is used to indicate the absence or optionality of a value. There are two typical use cases:
UNDEFINED
.When creating functions with optional parameters the following functions may be of interest:
undefinedp
(mixed x)UNDEFINED
,
and 0 otherwise.query_num_arg
()The type mixed is used to indicate that values of any type may be passed here, and that the actual type of the values that will be used at run-time is totally unknown.
This type is typically used when implementing container classes (where the actual values won't be manipulated by the code in the class), or as a convenience fall-back when the actual compile-time type is getting too complicated.
The type __unknown__ is used to indicate that nothing is known about the type of the value. It is the inverse of mixed|void.
It is most commonly used as the type for the content of empty container types (like eg: ({}) (array(__unknown__)) or ([]) (mapping(__unknown__:__unknown__))), or as the type for the many field in a callback function: (eg: function(int, __unknown__...:int)).
Note this type is new in Pike 9.0. In Pike 8.0 and earlier mixed was used for this in most contexts.
Futhermore more specific compile-time types may be constructed by either subtyping the basic types by specifying parameters (eg function(string(7bit), int(0..): string(7bit)) instead of just plain function), or by using the type union operator (`|) to specify several alternative types (eg int|float instead of mixed). Note that the run-time type may differ from the declared compile-time type (like eg function(string, int: string(7bit)) and int(17..17) respectively).
To improve the type-safety over just using mixed, it is possible to define place-holder types that are replaced with actual types when the class is used.
class Container (<T>) (T|void content) {} Container(<int>) int_container = Container(<int>)(17); Container(<float>) float_container = Container(<float>)(17.0); Container mixed_container = Container("foo");
The above class is similar to:
class Container(mixed|void content) {}
But the former allows the compiler to eg check that only values of the expected types are put in int_container and float_container.
Modifiers are keywords that may be specified before the type of a symbol declaration or inherit. They typically affect the symbol lookup or other related compiler behavior.
The protected modifier hides symbols from external
indexing (ie they are still accessable to subclassess that
have inherited the class, but not via predef::`->()
or predef::`[]()
).
In ancient versions of Pike this modifier was known as static.
The local modifier causes use of the symbol from the current class to not be affected by overloading by subclasses.
This modifier is also available with the name inline.
The modifier final is similar.
The private modifier hides symbols from internal indexing (ie they are not accessable to subclasses) and implies protected and local.
The final modifier causes the compiler to issue an error if a subclass attempts to overload the symbol.
The modifier local is similar, but more permissive.
Note in ancient versions of Pike this modifier was also available with the name nomask.
The optional modifier causes the type checker to consider the symbol as optional to implement to satisfy the API (albeit if the symbol exists it still must comply with the type).
The extern modifier indicates that a symbol may be implemented by a subclass, but does not define it in the current class. It implies optional.
The public modifier causes inherited private symbols to become local protected (and thus available to both the inheriting class and subsequent inherits, albeit not overrideable).
Note that the public modifier is only useful for inherit statements. In all other cases it is essentially a no-op.
The variant modifier is used to provide alternative APIs for functions. The different functions will be called depending on what the arguments are when the symbol is called.
The __weak__ modifier is used to indicate to the garbage collector that it may clear the variable if it is the only holder of the value.
Note that this modifier is new in Pike 9.0.
The __unused__ modifier is used to inhibit the warning that the symbol is not used.
Note that this modifier is new in Pike 9.0.
The __generator__ modifier converts a function into a function that returns a restartable function.
__generator__ int counter(int start, int stop) { while (start < stop) { continue return start++; } return stop; }
The above behaves similar to:
function(:int) counter(int start, int stop) { return lambda() { if (start <= stop) { return start++; } return UNDEFINED; }; }
Note that this modifier is new in Pike 9.0.
The __async__ modifier converts a function
into an asynchronous function. For such functions an
implicit Concurrent.Promise
object is allocated,
and all returns and yields are converted into setting
the promise followed by returning UNDEFINED,
except for the first return or yield which will return the
Concurrent.Future
corresponding to the promise.
__async__ int foo(mixed ... args) { return 17; }
The above behaves similar to:
Concurrent.Future(<int>) foo(mixed ... args) { Concurrent.Promise(<int>) __async_promise__ = Concurrent.Promise(<int>)(); __generator__ lambda() { __async_promise__->failure(catch { __async_promise__->success(17); return UNDEFINED; }); return UNDEFINED; }()(); return __async_promise__->future(); }
Restartable functions are required in order to be able
to use predef::await()
.
Note that this modifier is new in Pike 9.0.
The static modifier is currently identical to the protected modifier except for warning that it is being used (as of Pike 9.0).
Note: In a future version of Pike this may change. Do not use.
As mentioned in the beginning of this chapter, the assignment operator (=) does not copy anything when you use it on a pointer type. Instead it just creates another reference to the memory object. In most situations this does not present a problem, and it speeds up Pike's performance. However, you must be aware of this when programming. This can be illustrated with an example:
int main(int argc, array(string) argv) { array(string) tmp; tmp=argv; argv[0]="Hello world.\n"; write(tmp[0]); }
This program will of course write Hello world.
Sometimes you want to create a copy of a mapping, array or object. To
do so you simply call copy_value
with whatever you want to copy
as argument. Copy_value is recursive, which means that if you have an
array containing arrays, copies will be made of all those arrays.
If you don't want to copy recursively, or you know you don't have to copy recursively, you can use the plus operator instead. For instance, to create a copy of an array you simply add an empty array to it, like this: copy_of_arr = arr + ({}); If you need to copy a mapping you use an empty mapping, and for a multiset you use an empty multiset.
When declaring a variable, you also have to specify what type of variable it is. For most types, such as int and string this is very easy. But there are much more interesting ways to declare variables than that, let's look at a few examples:
int x; // x is an integer int|string x; // x is a string or an integer array(string) x; // x is an array of strings array x; // x is an array of mixed mixed x; // x can be any type string *x; // x is an array of strings // x is a mapping from int to string mapping(string:int) x; // x implements Stdio.File Stdio.File x; // x implements Stdio.File object(Stdio.File) x; // x is a function that takes two integer // arguments and returns a string function(int,int:string) x; // x is a function taking any amount of // integer arguments and returns nothing. function(int...:void) x; // x is ... complicated mapping(string:function(string|int...:mapping(string:array(string)))) x;
As you can see there are some interesting ways to specify types. Here is a list of what is possible: