11. Parsers

Module Parser.XML


Methodautoconvert

stringautoconvert(stringxml)

Class Parser.XML.Simple


Methodallow_rxml_entities

voidallow_rxml_entities(boolyes_no)


Methodcompat_allow_errors

voidcompat_allow_errors(stringversion)

Description

Set whether the parser should allow certain errors for compatibility with earlier versions. version can be:

"7.2"

Allow more data after the root element.

"7.6"

Allow multiple and invalidly placed "<?xml ... ?>" and "<!DOCTYPE ... >" declarations (invalid "<?xml ... ?>" declarations are otherwise treated as normal PI:s). Allow "<![CDATA[ ... ]]>" outside the root element. Allow the root element to be absent.

version can also be zero to enable all error checks.


Methoddefine_entity

voiddefine_entity(stringentity, strings, function(:void) cb, mixed ... extras)

Description

Define an entity or an SMEG.

Parameter entity

Entity name, or SMEG name (if preceeded by a "%").

Parameter s

Expansion of the entity. Entity evaluation will be performed.

See also

define_entity_raw()


Methoddefine_entity_raw

voiddefine_entity_raw(stringentity, stringraw)

Description

Define an entity or an SMEG.

Parameter entity

Entity name, or SMEG name (if preceeded by a "%").

Parameter raw

Verbatim expansion of the entity.

See also

define_entity()


Methodlookup_entity

stringlookup_entity(stringentity)

Returns

Returns the verbatim expansion of the entity.

Note

Added in Pike 7.7.


Methodparse

arrayparse(stringxml, stringcontext, function(:void) cb, mixed ... extra_args)
arrayparse(stringxml, function(:void) cb, mixed ... extra_args)

Note

The context argument was introduced in Pike 7.8.


Methodparse_dtd

mixedparse_dtd(stringdtd, stringcontext, function(:void) cb, mixed ... extras)
mixedparse_dtd(stringdtd, function(:void) cb, mixed ... extras)

Note

The context argument was introduced in Pike 7.8.

Class Parser.XML.Simple.Context


Methodcreate

Parser.XML.Simple.ContextParser.XML.Simple.Context(strings, stringcontext, intflags, function(:void) cb, mixed ... extra_args)
Parser.XML.Simple.ContextParser.XML.Simple.Context(strings, intflags, function(:void) cb, mixed ... extra_args)

Parameter s
Parameter context

These two arguments are passed along to push_string().

Parameter flags

Parser flags.

Parameter cb

Callback function. This function gets called at various stages during the parsing.

Note

The context argument was introduced in Pike 7.8.


Methodparse_dtd

mixedparse_dtd()


Methodparse_entity

stringparse_entity()


Methodparse_xml

mixedparse_xml()


Methodpush_string

voidpush_string(strings)
voidpush_string(strings, stringcontext)

Description

Add a string to parse at the current position.

Parameter s

String to insert at the current parsing position.

Parameter context

Optional context used to refer to the inserted string. This is typically an URL, but may also be an entity (preceeded by an "&") or a SMEG reference (preceeded by a "%"). Not used by the XML parser as such, but is simply passed into the callbackinfo mapping as the field "context" where it can be useful for eg resolving relative URLs when parsing DTDs, or for determining where errors occur.

Note

The context argument was introduced in Pike 7.8.

Class Parser.XML.Validating

Description

Validating XML parser.

Validates an XML file according to a DTD.

cf http://www.w3.org/TR/REC-xml/


Methodget_external_entity

stringget_external_entity(stringsysid, string|voidpubid, mapping|__deprecated__(int)|voidinfo, mixed ... extra)

Description

Get an external entity.

Called when a <!DOCTYPE> with a SYSTEM identifier is encountered, or when an entity reference needs expanding.

Parameter sysid

The SYSTEM identifier.

Parameter pubid

The PUBLIC identifier (if any).

Parameter info

The callbackinfo mapping containing the current parser state.

Parameter extra

The extra arguments as passed to parse() or parse_dtd().

Returns

Returns a string with a DTD fragment on success. Returns 0 (zero) on failure.

Note

Returning zero will cause the validator to report an error.

Note

In Pike 7.7 and earlier info had the value 0 (zero).

Note

The default implementation always returns 0 (zero). Override this function to provide other behaviour.

See also

parse(), parse_dtd()


InheritSimple

inherit .Simple : Simple

Description

Extends the Simple XML parser.


Methodisname

intisname(strings)

Description

Check if s is a valid Name.


Methodisnames

intisnames(strings)

Description

Check if s is a valid list of Names.


Methodisnmtoken

intisnmtoken(strings)

Description

Check if s is a valid Nmtoken.


Methodisnmtokens

intisnmtokens(strings)

Description

Check if s is a valid list of Nmtokens.


Methodparse

arrayparse(stringdata, string|function(string, string, mapping, array|string, mapping(string:mixed), mixed ... :mixed) callback, mixed ... extra)

FIXME

Document this function


Methodparse_dtd

arrayparse_dtd(stringdata, string|function(string, string, mapping, array|string, mapping(string:mixed), mixed ... :mixed) callback, mixed ... extra)

FIXME

Document this function


Methodvalidate

privatemixedvalidate(stringkind, stringname, mappingattributes, array|stringcontents, mapping(string:mixed) info, function(string, string, mapping, array|string, mapping(string:mixed), mixed ... :mixed) callback, array(mixed) extra)

Description

The validation callback function.

See also

::parse()

Class Parser.XML.Validating.Element

Description

XML Element node.

Module Parser.XML.NSTree

Description

A namespace aware version of Parser.XML.Tree. This implementation does as little validation as possible, so e.g. you can call your namespace xmlfoo without complaints.


InheritTree

inherit Parser.XML.Tree : Tree


Methodparse_input

NSNodeparse_input(stringdata, void|stringdefault_ns)

Description

Takes a XML string data and produces a namespace node tree. If default_ns is given, it will be used as the default namespace.

Throws

Throws an error when an error is encountered during XML parsing.


Methodvisualize

stringvisualize(Noden, void|stringindent)

Description

Makes a visualization of a node graph suitable for printing out on a terminal.

Example

> object x = parse_input("<a><b><c/>d</b><b><e/><f>g</f></b></a>"); > write(visualize(x)); Node(ROOT) NSNode(ELEMENT,"a") NSNode(ELEMENT,"b") NSNode(ELEMENT,"c") NSNode(TEXT) NSNode(ELEMENT,"b") NSNode(ELEMENT,"e") NSNode(ELEMENT,"f") NSNode(TEXT) Result 1: 201

Class Parser.XML.NSTree.NSNode

Description

Namespace aware node.


Methodadd_namespace

voidadd_namespace(stringns, void|stringsymbol, void|boolchain)

Description

Adds a new namespace to this node. The preferred symbol to use to identify the namespace can be provided in the symbol argument. If chain is set, no attempts to overwrite an already defined namespace with the same identifier will be made.


Methoddiff_namespaces

mapping(string:string) diff_namespaces()

Description

Returns the difference between this nodes and its parents namespaces.


Methodget_default_ns

stringget_default_ns()

Description

Returns the default namespace in the current scope.


Methodget_defined_nss

mapping(string:string) get_defined_nss()

Description

Returns a mapping with all the namespaces defined in the current scope, except the default namespace.

Note

The returned mapping is the same as the one in the node, so destructive changes will affect the node.


Methodget_ns

stringget_ns()

Description

Returns the namespace in which the current element is defined in.


Methodget_ns_attributes

mapping(string:mapping(string:string)) get_ns_attributes()

Description

Returns all the attributes in all namespaces that is associated with this node.

Note

The returned mapping is the same as the one in the node, so destructive changes will affect the node.


Methodget_ns_attributes

mapping(string:string) get_ns_attributes(stringnamespace)

Description

Returns the attributes in this node that is declared in the provided namespace.


Methodget_xml_name

stringget_xml_name()

Description

Returns the element name as it occurs in xml files. E.g. "zonk:name" for the element "name" defined in a namespace denoted with "zonk". It will look up a symbol for the namespace in the symbol tables for the node and its parents. If none is found a new label will be generated by hashing the namespace.


InheritNode

inherit Node : Node


Methodremove_child

voidremove_child(NSNodechild)

Description

The remove_child is a not updated to take care of name space issues. To properly remove all the parents name spaces from the chid, call remove_node in the child.

Module Parser.XML.SloppyDOM

Description

A somewhat DOM-like library that implements lazy generation of the node tree, i.e. it's generated from the data upon lookup. There's also a little bit of XPath evaluation to do queries on the node tree.

Implementation note: This is generally more pragmatic than Parser.XML.DOM, meaning it's not so pretty and compliant, but more efficient.

Implementation status: There's only enough implemented to parse a node tree from source and access it, i.e. modification functions aren't implemented. Data hiding stuff like NodeList and NamedNodeMap is not implemented, partly since it's cumbersome to meet the "live" requirement. Also, Parser.HTML is used in XML mode to parse the input. Thus it's too error tolerant to be XML compliant, and it currently doesn't handle DTD elements, like "<!DOCTYPE", or the XML declaration (i.e. "<?xml version='1.0'?>".


Methodparse

Documentparse(stringsource, void|intraw_values)

Description

Normally entities are decoded, and Node.xml_format will encode them again. If raw_values is nonzero then all text and attribute values are instead kept in their original form.

Class Parser.XML.SloppyDOM.Document

Note

The node tree is very likely a cyclic structure, so it might be an good idea to destruct it when you're finished with it, to avoid garbage. Destructing the Document object always destroys all nodes in it.


Methodget_elements

array(Element) get_elements(stringname)

Description

Note that this one looks among the top level elements, as opposed to get_elements_by_tag_name. This means that if the document is correct, you can only look up the single top level element here.

Note

Not DOM compliant.


Methodget_raw_values

intget_raw_values()

Note

Not DOM compliant.


InheritNodeWithChildElements

inherit NodeWithChildElements : NodeWithChildElements

Class Parser.XML.SloppyDOM.Node

Description

Basic node.


Methodget_text_content

stringget_text_content()

Description

If the raw_values flag is set in the owning document, the text is returned with entities and CDATA blocks intact.

See also

parse


Methodsimple_path

mapping(string:string)|Node|array(mapping(string:string)|Node)|stringsimple_path(stringpath, void|intxml_format)

Description

Access a node or a set of nodes through an expression that is a subset of an XPath RelativeLocationPath in abbreviated form.

That means one or more Steps separated by "/" or "//". A Step consists of an AxisSpecifier followed by a NodeTest and then optionally by one or more Predicate's.

"/" before a Step causes it to be matched only against the immediate children of the node(s) selected by the previous Step. "//" before a Step causes it to be matched against any children in the tree below the node(s) selected by the previous Step. The initial selection before the first Step is this element.

The currently allowed AxisSpecifier NodeTest combinations are:

  • name to select all elements with the given name. The name can be "*" to select all.

  • @name to select all attributes with the given name. The name can be "*" to select all.

  • comment() to select all comments.

  • text() to select all text and CDATA blocks. Note that all entity references are also selected, under the assumption that they would expand to text only.

  • processing-instruction("name") to select all processing instructions with the given name. The name can be left out to select all. Either ' or " may be used to delimit the name. For compatibility, it can also occur without surrounding quotes.

  • node() to select all nodes, i.e. the whole content of an element node.

  • . to select the currently selected element itself.

A Predicate is on the form [PredicateExpr] where PredicateExpr currently can be in any of the following forms:

  • An integer indexes one item in the selected set, according to the document order. A negative index counts from the end of the set.

  • A RelativeLocationPath as specified above. It's executed for each element in the selected set and those where it yields an empty result are filtered out while the rest remain in the set.

  • A RelativeLocationPath as specified above followed by ="value". The path is executed for each element in the selected set and those where the text result of it is equal to the given value remain in the set. Either ' or " may be used to delimit the value.

If xml_format is nonzero, the return value is an xml formatted string of all the matched nodes, in document order. Otherwise the return value is as follows:

Attributes are returned as one or more index/value pairs in a mapping. Other nodes are returned as the node objects. If the expression is on a form that can give at most one answer (i.e. there's a predicate with an integer index) then a single mapping or node is returned, or zero if there was no match. If the expression can give more answers then the return value is an array containing zero or more attribute mappings and/or nodes. The array follows document order.

Note

Not DOM compliant.


Methodxml_format

stringxml_format()

Description

Returns the formatted XML that corresponds to the node tree.

Note

Not DOM compliant.

Class Parser.XML.SloppyDOM.NodeWithChildElements

Description

Node with child elements.


Methodget_descendant_elements

array(Element) get_descendant_elements()

Description

Returns all descendant elements in document order.

Note

Not DOM compliant.


Methodget_descendant_nodes

array(Node) get_descendant_nodes()

Description

Returns all descendant nodes (except attribute nodes) in document order.

Note

Not DOM compliant.


Methodget_elements

array(Element) get_elements(stringname)

Description

Lightweight variant of get_elements_by_tag_name that returns a simple array instead of a fancy live NodeList.

Note

Not DOM compliant.


InheritNodeWithChildren

inherit NodeWithChildren : NodeWithChildren

Module Parser.XML.Tree

Description

XML parser that generates node-trees.

Has some support for XML namespaces http://www.w3.org/TR/REC-xml-names/ RFC 2518 23.4.

Note

This module defines two sets of node trees; the SimpleNode-based, and the Node-based. The main difference between the two, is that the Node-based trees have parent pointers, which tend to generate circular data references and thus garbage.

There are some more subtle differences between the two. Please read the documentation carefully.


ConstantDTD_ATTLIST

constantint Parser.XML.Tree.DTD_ATTLIST


ConstantDTD_ELEMENT

constantint Parser.XML.Tree.DTD_ELEMENT


ConstantDTD_ENTITY

constantint Parser.XML.Tree.DTD_ENTITY


ConstantDTD_NOTATION

constantint Parser.XML.Tree.DTD_NOTATION


ConstantSTOP_WALK

constantint Parser.XML.Tree.STOP_WALK


ConstantXML_ATTR

constantint Parser.XML.Tree.XML_ATTR

Description

Attribute nodes are created on demand


ConstantXML_COMMENT

constantint Parser.XML.Tree.XML_COMMENT


ConstantXML_DOCTYPE

constantint Parser.XML.Tree.XML_DOCTYPE


ConstantXML_ELEMENT

constantint Parser.XML.Tree.XML_ELEMENT


ConstantXML_HEADER

constantint Parser.XML.Tree.XML_HEADER


ConstantXML_NODE

constant Parser.XML.Tree.XML_NODE


ConstantXML_PI

constantint Parser.XML.Tree.XML_PI


ConstantXML_ROOT

constantint Parser.XML.Tree.XML_ROOT


ConstantXML_TEXT

constantint Parser.XML.Tree.XML_TEXT


Methodattribute_quote

stringattribute_quote(stringdata)

Description

Quotes the string given in data by escaping &, <, >, ' and ".


Methodparse_file

Nodeparse_file(stringpath, bool|voidparse_namespaces)

Description

Loads the XML file path, creates a node tree representation and returns the root node.


Methodparse_input

RootNodeparse_input(stringdata, void|boolno_fallback, void|boolforce_lowercase, void|mapping(string:string) predefined_entities, void|boolparse_namespaces, ParseFlags|voidflags)

Description

Takes an XML string and produces a node tree.

Note

flags is not used for PARSE_WANT_ERROR_CONTEXT, PARSE_FORCE_LOWERCASE or PARSE_ENABLE_NAMESPACES since they are covered by the separate flag arguments.


Methodroxen_attribute_quote

stringroxen_attribute_quote(stringdata)

Description

Quotes strings just like attribute_quote, but entities in the form

RXML parse error: Unknown scope "foo".
 | &foo.bar;
 | <xsltransform html-fallback="yes" preparse="yes" xsl="/assets/xsl/apps/manual-parser.xsl">
 | <else>
 | <else>
 | <nocache>
 | <cache enable-protocol-cache="yes">
will not be quoted.


Methodroxen_text_quote

stringroxen_text_quote(stringdata)

Description

Quotes strings just like text_quote, but entities in the form

RXML parse error: Unknown scope "foo".
 | &foo.bar;
 | <xsltransform html-fallback="yes" preparse="yes" xsl="/assets/xsl/apps/manual-parser.xsl">
 | <else>
 | <else>
 | <nocache>
 | <cache enable-protocol-cache="yes">
will not be quoted.


Methodsimple_parse_file

SimpleRootNodesimple_parse_file(stringpath, void|mappingpredefined_entities, ParseFlags|voidflags, string|voiddefault_namespace)

Description

Loads the XML file path, creates a SimpleNode tree representation and returns the root node.


Methodsimple_parse_input

SimpleRootNodesimple_parse_input(stringdata, void|mappingpredefined_entities, ParseFlags|voidflags, string|voiddefault_namespace)

Description

Takes an XML string and produces a SimpleNode tree.


Methodtext_quote

stringtext_quote(stringdata)

Description

Quotes the string given in data by escaping &, < and >.

Enum Parser.XML.Tree.ParseFlags

Description

Flags used together with simple_parse_input() and simple_parse_file().


ConstantPARSE_CHECK_ALL_ERRORS

constant Parser.XML.Tree.PARSE_CHECK_ALL_ERRORS


ConstantPARSE_COMPAT_ALLOW_ERRORS_7_2

constant Parser.XML.Tree.PARSE_COMPAT_ALLOW_ERRORS_7_2


ConstantPARSE_COMPAT_ALLOW_ERRORS_7_6

constant Parser.XML.Tree.PARSE_COMPAT_ALLOW_ERRORS_7_6


ConstantPARSE_DISALLOW_RXML_ENTITIES

constant Parser.XML.Tree.PARSE_DISALLOW_RXML_ENTITIES


ConstantPARSE_ENABLE_NAMESPACES

constant Parser.XML.Tree.PARSE_ENABLE_NAMESPACES


ConstantPARSE_FORCE_LOWERCASE

constant Parser.XML.Tree.PARSE_FORCE_LOWERCASE


ConstantPARSE_WANT_ERROR_CONTEXT

constant Parser.XML.Tree.PARSE_WANT_ERROR_CONTEXT

Class Parser.XML.Tree.AbstractNode

Description

Base class for nodes with parent pointers.


Methodadd_child

AbstractNodeadd_child(AbstractNodec)

Description

Adds the node c to the list of children of this node. The node is added before the node old, which is assumed to be an existing child of this node. The node is added first if old is zero.

Note

Returns the new child node, NOT the current node.

Returns

The new child node is returned.


Methodadd_child_after

AbstractNodeadd_child_after(AbstractNodec, AbstractNodeold)

Description

Adds the node c to the list of children of this node. The node is added after the node old, which is assumed to be an existing child of this node. The node is added first if old is zero.

Returns

The current node.


Methodadd_child_before

AbstractNodeadd_child_before(AbstractNodec, AbstractNodeold)

Description

Adds the node c to the list of children of this node. The node is added before the node old, which is assumed to be an existing child of this node. The node is added last if old is zero.

Returns

The current node.


Methodclone

AbstractNodeclone(void|int(-1..1)direction)

Description

Clones the node, optionally connected to parts of the tree. If direction is -1 the cloned nodes parent will be set, if direction is 1 the clone nodes childen will be set.


Methodfix_tree

voidfix_tree()

Description

Fix all parent pointers recursively in a tree that has been built with tmp_add_child.


Methodget_ancestors

array(AbstractNode) get_ancestors(boolinclude_self)

Description

Returns a list of all ancestors, with the top node last. The list will start with this node if include_self is set.


Methodget_following

array(AbstractNode) get_following()

Description

Returns all the nodes that follows after the current one.


Methodget_following_siblings

array(AbstractNode) get_following_siblings()

Description

Returns all following siblings, i.e. all siblings present after this node in the parents children list.


Methodget_parent

AbstractNodeget_parent()

Description

Returns the parent node.


Methodget_preceding

array(AbstractNode) get_preceding()

Description

Returns all preceding nodes, excluding this nodes ancestors.


Methodget_preceding_siblings

array(AbstractNode) get_preceding_siblings()

Description

Returns all preceding siblings, i.e. all siblings present before this node in the parents children list.


Methodget_root

AbstractNodeget_root()

Description

Follows all parent pointers and returns the root node.


Methodget_siblings

array(AbstractNode) get_siblings()

Description

Returns all siblings, including this node.


InheritAbstractSimpleNode

inherit AbstractSimpleNode : AbstractSimpleNode


Methodlow_clone

AbstractNodelow_clone()

Description

Returns an initialized copy of the node.

Note

The returned node has no children, and no parent.


Methodremove_child

voidremove_child(AbstractNodec)

Description

Removes all occurrences of the provided node from the called nodes list of children. The removed nodes parent reference is set to null.


Methodremove_node

voidremove_node()

Description

Removes this node from its parent. The parent reference is set to null.


Methodreplace_child

AbstractNodereplace_child(AbstractNodeold, AbstractNodenew)

Description

Replaces the first occurrence of the old node child with the new node child. All parent references are updated.

Note

The returned value is NOT the current node.

Returns

Returns the new child node.


Methodreplace_children

voidreplace_children(array(AbstractNode) children)

Description

Replaces the nodes children with the provided ones. All parent references are updated.


Methodreplace_node

AbstractNodereplace_node(AbstractNodenew)

Description

Replaces this node with the provided one.

Returns

Returns the new node.


Methodset_parent

voidset_parent(AbstractNodeparent)

Description

Sets the parent node to parent.


Methodtmp_add_child
Methodtmp_add_child_before
Methodtmp_add_child_after

AbstractNodetmp_add_child(AbstractNodec)
AbstractNodetmp_add_child_before(AbstractNodec, AbstractNodeold)
AbstractNodetmp_add_child_after(AbstractNodec, AbstractNodeold)

Description

Variants of add_child, add_child_before and add_child_after that doesn't set the parent pointer in the newly added children.

This is useful while building a node tree, to get efficient refcount garbage collection if the build stops abruptly. fix_tree has to be called on the root node when the building is done.

Class Parser.XML.Tree.AbstractSimpleNode

Description

Base class for nodes.


Method`[]

AbstractSimpleNode res = Parser.XML.Tree.AbstractSimpleNode()[ pos ]

Description

The [] operator indexes among the node children, so node[0] returns the first node and node[-1] the last.

Note

The [] operator will select a node from all the nodes children, not just its element children.


Methodadd_child

AbstractSimpleNodeadd_child(AbstractSimpleNodec)

Description

Adds the given node to the list of children of this node. The new node is added last in the list.

Note

The return value differs from the one returned by Node()->add_child().

Returns

The current node.


Methodadd_child_after

AbstractSimpleNodeadd_child_after(AbstractSimpleNodec, AbstractSimpleNodeold)

Description

Adds the node c to the list of children of this node. The node is added after the node old, which is assumed to be an existing child of this node. The node is added first if old is zero.

Returns

The current node.


Methodadd_child_before

AbstractSimpleNodeadd_child_before(AbstractSimpleNodec, AbstractSimpleNodeold)

Description

Adds the node c to the list of children of this node. The node is added before the node old, which is assumed to be an existing child of this node. The node is added last if old is zero.

Returns

The current node.


Methodclone

AbstractSimpleNodeclone()

Description

Returns a clone of the sub-tree rooted in the node.


Methodcount_children

intcount_children()

Description

Returns the number of children of the node.


Methodget_children

array(AbstractSimpleNode) get_children()

Description

Returns all the nodes children.


Methodget_descendants

array(AbstractSimpleNode) get_descendants(boolinclude_self)

Description

Returns a list of all descendants in document order. Includes this node if include_self is set.


Methodget_last_child

AbstractSimpleNodeget_last_child()

Description

Returns the last child node or zero.


Methoditerate_children

intiterate_children(function(AbstractSimpleNode, mixed ... :int|void) callback, mixed ... args)

Description

Iterates over the nodes children from left to right, calling the function callback for every node. If the callback function returns STOP_WALK the iteration is promptly aborted and STOP_WALK is returned.


Methodlow_clone

AbstractSimpleNodelow_clone()

Description

Returns an initialized copy of the node.

Note

The returned node has no children.


Methodremove_child

voidremove_child(AbstractSimpleNodec)

Description

Removes all occurrences of the provided node from the list of children of this node.


Methodreplace_child

AbstractSimpleNodereplace_child(AbstractSimpleNodeold, AbstractSimpleNodenew)

Description

Replaces the first occurrence of the old node child with the new node child.

Note

The return value differs from the one returned by Node()->replace_child().

Returns

Returns the current node on success, and 0 (zero) if the node old wasn't found.


Methodreplace_children

voidreplace_children(array(AbstractSimpleNode) children)

Description

Replaces the nodes children with the provided ones.


Methodwalk_inorder

intwalk_inorder(function(AbstractSimpleNode, mixed ... :int|void) callback, mixed ... args)

Description

Traverse the node subtree in inorder, left subtree first, then root node, and finally the remaining subtrees, calling the function callback for every node. If the function callback returns STOP_WALK the traverse is promptly aborted and STOP_WALK is returned.


Methodwalk_postorder

intwalk_postorder(function(AbstractSimpleNode, mixed ... :int|void) callback, mixed ... args)

Description

Traverse the node subtree in postorder, first subtrees from left to right, then the root node, calling the function callback for every node. If the function callback returns STOP_WALK the traverse is promptly aborted and STOP_WALK is returned.


Methodwalk_preorder

intwalk_preorder(function(AbstractSimpleNode, mixed ... :int|void) callback, mixed ... args)

Description

Traverse the node subtree in preorder, root node first, then subtrees from left to right, calling the callback function for every node. If the callback function returns STOP_WALK the traverse is promptly aborted and STOP_WALK is returned.


Methodwalk_preorder_2

intwalk_preorder_2(function(AbstractSimpleNode, mixed ... :int|void) cb_1, function(AbstractSimpleNode, mixed ... :int|void) cb_2, mixed ... args)

Description

Traverse the node subtree in preorder, root node first, then subtrees from left to right. For each node we call cb_1 before iterating through children, and then cb_2 (which always gets called even if the walk is aborted earlier). If the callback function returns STOP_WALK the traverse decend is aborted and STOP_WALK is returned once all waiting cb_2 functions have been called.


Methodzap_tree

voidzap_tree()

Description

Destruct the tree recursively. When the inheriting AbstractNode or Node is used, which have parent pointers, this function should be called for every tree that no longer is in use to avoid frequent garbage collector runs.

Class Parser.XML.Tree.AttributeNode


Methodcreate

Parser.XML.Tree.AttributeNodeParser.XML.Tree.AttributeNode(stringname, stringvalue)


InheritNode

inherit Node : Node

Class Parser.XML.Tree.CommentNode


Methodcreate

Parser.XML.Tree.CommentNodeParser.XML.Tree.CommentNode(stringtext)


InheritNode

inherit Node : Node

Class Parser.XML.Tree.DTDAttlistNode


Methodcreate

Parser.XML.Tree.DTDAttlistNodeParser.XML.Tree.DTDAttlistNode(stringname, mapping(string:string) attrs, stringcontents)


InheritNode

inherit Node : Node

Class Parser.XML.Tree.DTDElementNode


Methodcreate

Parser.XML.Tree.DTDElementNodeParser.XML.Tree.DTDElementNode(stringname, arrayexpression)


InheritDTDElementHelper

inherit DTDElementHelper : DTDElementHelper


InheritNode

inherit Node : Node

Class Parser.XML.Tree.DTDEntityNode


Methodcreate

Parser.XML.Tree.DTDEntityNodeParser.XML.Tree.DTDEntityNode(stringname, mapping(string:string) attrs, stringcontents)


InheritNode

inherit Node : Node

Class Parser.XML.Tree.DTDNotationNode


Methodcreate

Parser.XML.Tree.DTDNotationNodeParser.XML.Tree.DTDNotationNode(stringname, mapping(string:string) attrs, stringcontents)


InheritNode

inherit Node : Node

Class Parser.XML.Tree.DoctypeNode


Methodcreate

Parser.XML.Tree.DoctypeNodeParser.XML.Tree.DoctypeNode(stringname, mapping(string:string) attrs, arraycontents)


InheritNode

inherit Node : Node

Class Parser.XML.Tree.ElementNode


Methodcreate

Parser.XML.Tree.ElementNodeParser.XML.Tree.ElementNode(stringname, mapping(string:string) attrs)


InheritNode

inherit Node : Node

Class Parser.XML.Tree.HeaderNode


Methodcreate

Parser.XML.Tree.HeaderNodeParser.XML.Tree.HeaderNode(mapping(string:string) attrs)


InheritNode

inherit Node : Node

Class Parser.XML.Tree.Node

Description

XML node with parent pointers.


Methodget_attr_name

stringget_attr_name()

Description

Returns the name of the attribute node.


Methodget_attribute_nodes

array(Node) get_attribute_nodes()

Description

Creates and returns an array of new nodes; they will not be added as proper children to the parent node, but the parent link in the nodes are set so that upwards traversal is made possible.


Methodget_tag_name

stringget_tag_name()

Description

Returns the name of the element node, or the nearest element above if an attribute node.


InheritAbstractNode

inherit AbstractNode : AbstractNode


InheritVirtualNode

inherit VirtualNode : VirtualNode

Class Parser.XML.Tree.PINode


Methodcreate

Parser.XML.Tree.PINodeParser.XML.Tree.PINode(stringname, mapping(string:string) attrs, stringcontents)


InheritNode

inherit Node : Node

Class Parser.XML.Tree.RootNode

Description

The root node of an XML-tree consisting of Nodes.


Methodcreate

Parser.XML.Tree.RootNodeParser.XML.Tree.RootNode(string|voiddata, mapping|voidpredefined_entities, ParseFlags|voidflags)


Methodflush_node_id_cache

voidflush_node_id_cache()

Description

Clears the node id cache built and used by get_element_by_id.


Methodget_element_by_id

ElementNodeget_element_by_id(stringid, int|voidforce)

Description

Find the element with the specified id.

Parameter id

The XML id of the node to search for.

Parameter force

Force a regeneration of the id lookup cache. Needed the first time after the node tree has been modified by adding or removing element nodes, or by changing the id attribute of an element node.

Returns

Returns the element node with the specified id if any. Returns UNDEFINED otherwise.

See also

flush_node_id_cache


InheritNode

inherit Node : Node


InheritXMLParser

inherit XMLParser : XMLParser

Class Parser.XML.Tree.SimpleCommentNode


Methodcreate

Parser.XML.Tree.SimpleCommentNodeParser.XML.Tree.SimpleCommentNode(stringcomment)


InheritSimpleNode

inherit SimpleNode : SimpleNode

Class Parser.XML.Tree.SimpleDTDAttlistNode


Methodcreate

Parser.XML.Tree.SimpleDTDAttlistNodeParser.XML.Tree.SimpleDTDAttlistNode(stringname, mapping(string:string) attrs, stringcontents)


InheritSimpleNode

inherit SimpleNode : SimpleNode

Class Parser.XML.Tree.SimpleDTDElementNode


Methodcreate

Parser.XML.Tree.SimpleDTDElementNodeParser.XML.Tree.SimpleDTDElementNode(stringname, arrayexpression)


InheritDTDElementHelper

inherit DTDElementHelper : DTDElementHelper


InheritSimpleNode

inherit SimpleNode : SimpleNode

Class Parser.XML.Tree.SimpleDTDEntityNode


Methodcreate

Parser.XML.Tree.SimpleDTDEntityNodeParser.XML.Tree.SimpleDTDEntityNode(stringname, mapping(string:string) attrs, stringcontents)


InheritSimpleNode

inherit SimpleNode : SimpleNode

Class Parser.XML.Tree.SimpleDTDNotationNode


Methodcreate

Parser.XML.Tree.SimpleDTDNotationNodeParser.XML.Tree.SimpleDTDNotationNode(stringname, mapping(string:string) attrs, stringcontents)


InheritSimpleNode

inherit SimpleNode : SimpleNode

Class Parser.XML.Tree.SimpleDoctypeNode


Methodcreate

Parser.XML.Tree.SimpleDoctypeNodeParser.XML.Tree.SimpleDoctypeNode(stringname, mapping(string:string) attrs, arraycontents)


InheritSimpleNode

inherit SimpleNode : SimpleNode

Class Parser.XML.Tree.SimpleElementNode


Methodcreate

Parser.XML.Tree.SimpleElementNodeParser.XML.Tree.SimpleElementNode(stringname, mapping(string:string) attrs)


InheritSimpleNode

inherit SimpleNode : SimpleNode

Class Parser.XML.Tree.SimpleHeaderNode


Methodcreate

Parser.XML.Tree.SimpleHeaderNodeParser.XML.Tree.SimpleHeaderNode(mapping(string:string) attrs)


InheritSimpleNode

inherit SimpleNode : SimpleNode

Class Parser.XML.Tree.SimpleNode

Description

XML node without parent pointers and attribute nodes.


InheritAbstractSimpleNode

inherit AbstractSimpleNode : AbstractSimpleNode


InheritVirtualNode

inherit VirtualNode : VirtualNode

Class Parser.XML.Tree.SimplePINode


Methodcreate

Parser.XML.Tree.SimplePINodeParser.XML.Tree.SimplePINode(stringname, mapping(string:string) attrs, stringcontents)


InheritSimpleNode

inherit SimpleNode : SimpleNode

Class Parser.XML.Tree.SimpleRootNode

Description

The root node of an XML-tree consisting of SimpleNodes.


Methodcreate

Parser.XML.Tree.SimpleRootNodeParser.XML.Tree.SimpleRootNode(string|voiddata, mapping|voidpredefined_entities, ParseFlags|voidflags, string|voiddefault_namespace)


Methodflush_node_id_cache

voidflush_node_id_cache()

Description

Clears the node id cache built and used by get_element_by_id.


Methodget_element_by_id

SimpleElementNodeget_element_by_id(stringid, int|voidforce)

Description

Find the element with the specified id.

Parameter id

The XML id of the node to search for.

Parameter force

Force a regeneration of the id lookup cache. Needed the first time after the node tree has been modified by adding or removing element nodes, or by changing the id attribute of an element node.

Returns

Returns the element node with the specified id if any. Returns UNDEFINED otherwise.

See also

flush_node_id_cache


InheritSimpleNode

inherit SimpleNode : SimpleNode


InheritXMLParser

inherit XMLParser : XMLParser

Class Parser.XML.Tree.SimpleTextNode


Methodcreate

Parser.XML.Tree.SimpleTextNodeParser.XML.Tree.SimpleTextNode(stringtext)


InheritSimpleNode

inherit SimpleNode : SimpleNode

Class Parser.XML.Tree.TextNode


Methodcreate

Parser.XML.Tree.TextNodeParser.XML.Tree.TextNode(stringtext)


InheritNode

inherit Node : Node

Class Parser.XML.Tree.VirtualNode

Description

Node in XML tree


Methodcast

(int)Parser.XML.Tree.VirtualNode()
(float)Parser.XML.Tree.VirtualNode()
(string)Parser.XML.Tree.VirtualNode()
(array)Parser.XML.Tree.VirtualNode()
(mapping)Parser.XML.Tree.VirtualNode()
(multiset)Parser.XML.Tree.VirtualNode()

Description

It is possible to cast a node to a string, which will return render_xml() for that node.


Methodcreate

Parser.XML.Tree.VirtualNodeParser.XML.Tree.VirtualNode(inttype, stringname, mappingattr, stringtext)


Methodget_any_name

stringget_any_name()

Description

Return name of tag or name of attribute node.


Methodget_attributes

mapping(string:string) get_attributes()

Description

Returns this nodes attributes, which can be altered destructivly to alter the nodes attributes.

See also

replace_attributes()


Methodget_doc_order

intget_doc_order()


Methodget_elements

array(AbstractNode) get_elements(string|voidname, bool|voidfull)

Description

Returns all element children to this node.

Parameter name

If provided, only elements with that name is returned.

Parameter full

If specified, name matching will be done against the full name.

Returns

Returns an array with matching nodes.


Methodget_first_element

AbstractNodeget_first_element(string|voidname, bool|voidfull)

Description

Returns the first element child to this node.

Parameter name

If provided, the first element child with that name is returned.

Parameter full

If specified, name matching will be done against the full name.

Returns

Returns the first matching node, and 0 if no such node was found.


Methodget_full_name

stringget_full_name()

Description

Return fully qualified name of the element node.


Methodget_namespace

stringget_namespace()

Description

Return the (resolved) namespace for this node.


Methodget_node_type

intget_node_type()

Description

Returns the node type. See defined node type constants.


Methodget_short_attributes

mappingget_short_attributes()

Description

Returns this nodes name-space adjusted attributes.

Note

set_short_namespaces() or set_short_attributes() must have been called before calling this function.


Methodget_tag_name

stringget_tag_name()

Description

Returns the name of the element node, or the nearest element above if an attribute node.


Methodget_text

stringget_text()

Description

Returns text content in node.


Methodrender_to_file

voidrender_to_file(Stdio.Filef, void|boolpreserve_roxen_entities)

Description

Creates an XML representation for the node sub tree and streams the output to the file f. If the flag preserve_roxen_entities is set, entities on the form

RXML parse error: Unknown scope "foo".
 | &foo.bar;
 | <xsltransform html-fallback="yes" preparse="yes" xsl="/assets/xsl/apps/manual-parser.xsl">
 | <else>
 | <else>
 | <nocache>
 | <cache enable-protocol-cache="yes">
will not be escaped.


Methodrender_xml

stringrender_xml(void|boolpreserve_roxen_entities, void|mapping(string:string) namespace_lookup)

Description

Creates an XML representation of the node sub tree. If the flag preserve_roxen_entities is set, entities on the form

RXML parse error: Unknown scope "foo".
 | &foo.bar;
 | <xsltransform html-fallback="yes" preparse="yes" xsl="/assets/xsl/apps/manual-parser.xsl">
 | <else>
 | <else>
 | <nocache>
 | <cache enable-protocol-cache="yes">
will not be escaped.

Parameter namespace_lookup

Mapping from namespace prefix to namespace symbol prefix.


Methodreplace_attributes

voidreplace_attributes(mapping(string:string) attrs)

Description

Replace the entire set of attributes.

See also

get_attributes()


Methodset_doc_order

voidset_doc_order(into)


Methodset_short_attributes

voidset_short_attributes(mappingshort_attrs)

Description

Sets this nodes name-space adjusted attributes.


Methodset_tag_name

voidset_tag_name(stringname)

Description

Change the tag name destructively. Can only be used on element and processing-instruction nodes.


Methodvalue_of_node

stringvalue_of_node()

Description

If the node is an attribute node or a text node, its value is returned. Otherwise the child text nodes are concatenated and returned.

Class Parser.XML.Tree.XMLNSParser

Description

Namespace aware parser.


MethodEnter

mapping(string:string) Enter(mapping(string:string) attrs)

Description

Check attrs for namespaces.

Returns

Returns the namespace expanded version of attrs.

Class Parser.XML.Tree.XMLParser

Description

Mixin for parsing XML.

Uses Parser.XML.Simple to perform the actual parsing.


Methodnode_factory

protectedthis_programnode_factory(inttype, stringname, mappingattr, stringtext)

Description

Factory for creating nodes.

Parameter type

Type of node to create. One of:

XML_TEXT

XML text. text contains a string with the text.

XML_COMMENT

XML comment. text contains a string with the comment text.

XML_HEADER

<?xml?>-header attr contains a mapping with the attributes.

XML_PI

XML processing instruction. name contains the name of the processing instruction and text the remainder.

XML_ELEMENT

XML element tag. name contains the name of the tag and attr the attributes.

XML_DOCTYPE

DTD information.

DTD_ENTITY
DTD_ELEMENT
DTD_ATTLIST
DTD_NOTATION
Parameter name

Name of the tag if applicable.

Parameter attr

Attributes for the tag if applicable.

Parameter text

Contained text of the tab if any.

This function is called during parsning to create the various XML nodes.

Overload this function to provide application-specific XML nodes.

Returns

Returns a node object representing the XML tag, or 0 (zero) if the subtree rooted in the tag should be cut.

Note

This function is not available in Pike 7.6 and earlier.

Class Parser.HTML

Description

This is a simple parser for SGML structured markups. It's not really HTML, but it's useful for that purpose.

The simple way to use it is to give it some information about available tags and containers, and what callbacks those are to call.

The object is easily reused, by calling the clone() function.

See also

add_tag, add_container, finish


Method_inspect

mapping_inspect()

Description

This is a low-level way of debugging a parser. This gives a mapping of the internal state of the Parser.HTML object.

The format and contents of this mapping may change without further notice.


Method_set_tag_callback
Method_set_entity_callback
Method_set_data_callback

Parser.HTML_set_tag_callback(function(:void)|string|arrayto_call)
Parser.HTML_set_entity_callback(function(:void)|string|arrayto_call)
Parser.HTML_set_data_callback(function(:void)|string|arrayto_call)

Description

These functions set up the parser object to call the given callbacks upon tags, entities and/or data. The callbacks will only be called if there isn't another tag/container/entity handler for these.

The callback function will be called with the parser object as first argument, and the active string as second. Note that no parsing of the contents has been done. Both endtags and normal tags are called; there is no container parsing.

The return values from the callbacks are handled in the same way as the return values from callbacks registered with add_tag and similar functions.

The data callback will be called as seldom as possible with the longest possible string, as long as it doesn't get called out of order with any other callback. It will never be called with a zero length string.

If a string or array is given instead of a function, it will act as the return value from the function. Arrays or empty strings is probably preferable to avoid recursion.

Returns

Returns the object being called.


Methodadd_tag
Methodadd_container
Methodadd_entity
Methodadd_quote_tag
Methodadd_tags
Methodadd_containers
Methodadd_entities

Parser.HTMLadd_tag(stringname, mixedto_do)
Parser.HTMLadd_container(stringname, mixedto_do)
Parser.HTMLadd_entity(stringentity, mixedto_do)
Parser.HTMLadd_quote_tag(stringname, mixedto_do, stringend)
Parser.HTMLadd_tags(mapping(string:mixed) tags)
Parser.HTMLadd_containers(mapping(string:mixed) containers)
Parser.HTMLadd_entities(mapping(string:mixed) entities)

Description

Registers the actions to take when parsing various things. Tags, containers, entities are as usual. add_quote_tag() adds a special kind of tag that reads any data until the next occurrence of the end string immediately before a tag end.

Parameter to_do

This argument can be any of the following.

function(:void)

The function will be called as a callback function. It will get the following arguments, depending on the type of callback.

 mixed tag_callback(Parser.HTML parser,mapping args,mixed ... extra)
 mixed container_callback(Parser.HTML parser,mapping args,string content,mixed ... extra)
 mixed entity_callback(Parser.HTML parser,mixed ... extra)
 mixed quote_tag_callback(Parser.HTML parser,string content,mixed ... extra)
	
string

This tag/container/entity is then replaced by the string. The string is normally not reparsed, i.e. it's equivalent to writing a function that returns the string in an array (but a lot faster). If reparse_strings is set the string will be reparsed, though.

array

The first element is a function as above. It will receive the rest of the array as extra arguments. If extra arguments are given by set_extra(), they will appear after the ones in this array.

int(0..)

If there is a tag/container/entity with the given name in the parser, it's removed.

The callback function can return:

string

This string will be pushed on the parser stack and be parsed. Be careful not to return anything in this way that could lead to a infinite recursion.

array

The element(s) of the array is the result of the function. This will not be parsed. This is useful for avoiding infinite recursion. The array can be of any size, this means the empty array is the most effective to return if you don't care about the result. If the parser is operating in mixed_mode, the array can contain anything. Otherwise only strings are allowed.

int(0..0)

This means "don't do anything", ie the item that generated the callback is left as it is, and the parser continues.

int(1..1)

Reparse the last item again. This is useful to parse a tag as a container, or vice versa: just add or remove callbacks for the tag and return this to jump to the right callback.

Returns

Returns the object being called.

See also

tags, containers, entities


Methodat
Methodat_line
Methodat_char
Methodat_column

array(int) at()
intat_line()
intat_char()
intat_column()

Description

Returns the current position. Characters and columns count from 0, lines count from 1.

at() gives an array with the following layout.

Array
int0

Line.

int1

Character.

int2

Column.


Methodcase_insensitive_tag

intcase_insensitive_tag(void|intvalue)

Description

All tags and containers are matched case insensitively, and argument names are converted to lowercase. Tags added with add_quote_tag() are not affected, though. Switching to case insensitive mode and back won't preserve the case of registered tags and containers.


Methodclear_tags
Methodclear_containers
Methodclear_entities
Methodclear_quote_tags

Parser.HTMLclear_tags()
Parser.HTMLclear_containers()
Parser.HTMLclear_entities()
Parser.HTMLclear_quote_tags()

Description

Removes all registered definitions in the different categories.

Returns

Returns the object being called.

See also

add_tag, add_tags, add_container, add_containers, add_entity, add_entities


Methodclone

Parser.HTMLclone(mixed ... args)

Description

Clones the Parser.HTML object. A new object of the same class is created, filled with the parse setup from the old object.

This is the simpliest way of flushing a parse feed/output.

The arguments to clone is sent to the new object, simplifying work for custom classes that inherits Parser.HTML.

Returns

Returns the new object.

Note

create is called _before_ the setup is copied.


Methodtags
Methodcontainers
Methodentities

mapping(string:mixed) tags()
mapping(string:mixed) containers()
mapping(string:mixed) entities()

Description

Returns the current callback settings. When matching is done case insensitively, all names will be returned in lowercase.

Implementation note: These run in constant time since they return copy-on-write mappings.

See also

add_tag, add_tags, add_container, add_containers, add_entity, add_entities


Methodcontext

stringcontext()

Description

Returns the current output context as a string.

"data"

In top level data. This is always returned when called from tag or container callbacks.

"arg"

In an unquoted argument.

"splice_arg"

In a splice argument.

The return value can also be a single character string, in which case the context is a quoted argument. The string contains the starting quote character.

This function is typically only useful in entity callbacks, which can be called both from text and argument values of different sorts.

See also

splice_arg


Methodcurrent

stringcurrent()

Description

Gives the current range of data, ie the whole tag/entity/etc being parsed in the current callback. Returns zero if there's no current range, i.e. when the function is not called in a callback.


Methodfeed

Parser.HTMLfeed()
Parser.HTMLfeed(strings, void|intdo_parse)

Description

Feed new data to the Parser.HTML object. This will start a scan and may result in callbacks. Note that it's possible that all data fed isn't processed - to do that, call finish().

If the function is called without arguments, no data is fed, but the parser is run. If the string argument is followed by a 0, ->feed(s,0);, the string is fed, but the parser isn't run.

Returns

Returns the object being called.

See also

finish, read, feed_insert


Methodfeed_insert

Parser.HTMLfeed_insert(strings)

Description

This pushes a string on the parser stack.

Returns

Returns the object being called.

Note

Don't use!


Methodfinish

Parser.HTMLfinish()
Parser.HTMLfinish(strings)

Description

Finish a parser pass. A string may be sent here, similar to feed().

Returns

Returns the object being called.


Methodget_extra

arrayget_extra()

Description

Gets the extra arguments set by set_extra().

Returns

Returns the object being called.


Methodignore_comments

intignore_comments(void|intvalue)


Methodignore_tags

intignore_tags(void|intvalue)

Description

Do not look for tags at all. Normally tags are matched even when there's no callbacks for them at all. When this is set, the tag delimiters '<' and '>' will be treated as any normal character.


Methodignore_unknown

intignore_unknown(void|intvalue)

Description

Treat unknown tags and entities as text data, continuing parsing for tags and entities inside them.

Note

When functions are specified with _set_tag_callback() or _set_entity_callback(), all tags or entities, respectively, are considered known. However, if one of those functions return 1 and ignore_unknown is set, they are treated as text data instead of making another call to the same function again.


Methodlazy_argument_end

intlazy_argument_end(void|intvalue)

Description

A '>' in a tag argument closes both the argument and the tag, even if the argument is quoted.


Methodlazy_entity_end

intlazy_entity_end(void|intvalue)

Description

Normally, the parser search indefinitely for the entity end character (i.e. ';'). When this flag is set, the characters '&', '<', '>', '"', ''', and any whitespace breaks the search for the entity end, and the entity text is then ignored, i.e. treated as data.


Methodmatch_tag

intmatch_tag(void|intvalue)

Description

Unquoted nested tag starters and enders will be balanced when parsing tags. This is the default.


Methodmax_parse_depth

intmax_parse_depth(void|intvalue)

Description

Maximum recursion depth during parsing. Recursion occurs when a tag/container/entity/quote tag callback function returns a string to be reparsed. The default value is 10.


Methodmixed_mode

intmixed_mode(void|intvalue)

Description

Allow callbacks to return arbitrary data in the arrays, which will be concatenated in the output.


Methodnestling_entity_end

intnestling_entity_end(void|intvalue)


Methodparse_tag_args

mappingparse_tag_args(stringtag)

Description

Parses the tag arguments from a tag string without the name and surrounding brackets, i.e. a string on the form "some='tag'  args".

Returns

Returns a mapping containing the tag arguments.

See also

tag_args


Methodparse_tag_name

stringparse_tag_name(stringtag)

Description

Parses the tag name from a tag string without the surrounding brackets, i.e. a string on the form "tagname some='tag'  args".

Returns

Returns the tag name or an empty string if none.


Methodquote_stapling

intquote_stapling(int|voidenable)

Description

Enable old-style attribute quoting by stapling.

Parameter enable

Enable/disable the mode. Defaults to keeping the old setting.

Returns

Returns the prior setting.

Note

Any use of this mode is discouraged, and is only provided for compatibility with versions of Pike prior to 8.0.

Note

Note also that this mode will output runtime warnings whenever the mode has had an effect on the parsing.


Methodquote_tags

mapping(string:array(mixed|string)) quote_tags()

Description

Returns the current callback settings. The values are arrays ({callback, end_quote}). When matching is done case insensitively, all names will be returned in lowercase.

Implementation note: quote_tags() allocates a new mapping for every call and thus, unlike e.g. tags() runs in linear time.

See also

add_quote_tag


Methodread

string|array(mixed) read()
string|array(mixed) read(intmax_elems)

Description

Read parsed data from the parser object.

Returns

Returns a string of parsed data if the parser isn't in mixed_mode, an array of arbitrary data otherwise.


Methodreparse_strings

intreparse_strings(void|intvalue)

Description

When a plain string is used as a tag/container/entity/quote tag callback, it's not reparsed if this flag is unset. Setting it causes all such strings to be reparsed.


Methodset_extra

Parser.HTMLset_extra(mixed ... args)

Description

Sets the extra arguments passed to all tag, container and entity callbacks.

Returns

Returns the object being called.


Methodsplice_arg

stringsplice_arg(void|stringname)

Description

If given a string, it sets the splice argument name to it. It returns the old splice argument name.

If a splice argument name is set, it's parsed in all tags, both those with callbacks and those without. Wherever it occurs, its value (after being parsed for entities in the normal way) is inserted directly into the tag. E.g:

<foo arg1="val 1" splice="arg2='val 2' arg3" arg4>

becomes

<foo arg1="val 1" arg2='val 2' arg3 arg4>

if "splice" is set as the splice argument name.


Methodtag

arraytag(void|mixeddefault_value)

Description

Returns the equivalent of the following calls.

Array
string0

tag_name()

mapping(string:mixed) 1

tag_args(default_value)

string2

tag_content()


Methodtag_args

mapping(string:mixed) tag_args(void|mixeddefault_value)

Description

Gives the arguments of the current tag, parsed to a convenient mapping consisting of key:value pairs. If the current thing isn't a tag, it gives zero. default_value is used for arguments which have no value in the tag. If default_value isn't given, the value is set to the same string as the key.


Methodtag_content

stringtag_content()

Description

Gives the content of the current tag, if it's a container or quote tag. Otherwise returns zero.


Methodtag_name

stringtag_name()

Description

Gives the name of the current tag, or zero. If used from an entity callback, it gives the string inside the entity.


Methodwrite_out

Parser.HTMLwrite_out(mixed ... args)

Description

Send data to the output stream, i.e. it won't be parsed and it won't be sent to the data callback, if any.

Any data is allowed when the parser is running in mixed_mode. Only strings are allowed otherwise.

Returns

Returns the object being called.


Methodws_before_tag_name

intws_before_tag_name(void|intvalue)

Description

Allow whitespace between the tag start character and the tag name.


Methodxml_tag_syntax

intxml_tag_syntax(void|intvalue)

Description

Whether or not to use XML syntax to tell empty tags and container tags apart.

0

Use HTML syntax only. If there's a '/' last in a tag, it's just treated as any other argument.

1

Use HTML syntax, but ignore a '/' if it comes last in a tag. This is the default.

2

Use XML syntax, but when a tag that does not end with '/>' is found which only got a non-container tag callback, treat it as a non-container (i.e. don't start to seek for the container end).

3

Use XML syntax only. If a tag got both container and non-container callbacks, the non-container callback is called when the empty element form (i.e. the one ending with '/>') is used, and the container callback otherwise. If only a container callback exists, it gets the empty string as content when there's none to be parsed. If only a non-container callback exists, it will be called (without the content argument) for both kinds of tags.

Module Parser


Methoddecode_numeric_xml_entity

stringdecode_numeric_xml_entity(stringchref)

Description

Decodes the numeric XML entity chref, e.g. "&#x34;" and returns the character as a string. chref is the name part of the entity, i.e. without the leading '&' and trailing ';'. Returns zero if chref isn't on a recognized form or if the character number is too large to be represented in a string.


Methodencode_html_entities

stringencode_html_entities(stringraw)

Description

Encode characters to HTML entities, e.g. turning "<" into "&lt;".

The characters that will be encoded are characters <= 32, "\"&'<>" and characters >= 127 and <= 160 and characters >= 255.


Methodget_xml_parser

HTMLget_xml_parser()

Description

Returns a Parser.HTML initialized for parsing XML. It has all the flags set properly for XML syntax and callbacks to ignore comments, CDATA blocks and unknown PI tags, but it has no registered tags and doesn't decode any entities.


Methodhtml_entity_parser
Methodparse_html_entities

HTMLhtml_entity_parser()
stringparse_html_entities(stringin)
HTMLhtml_entity_parser(intnoerror)
stringparse_html_entities(stringin, intnoerror)

Description

Parse any HTML entities in the string to unicode characters. Either return a complete parser (to build on or use) or parse a string. Throw an error if there is an unrecognized entity in the string if noerror is not set.

Note

Currently using XHTML 1.0 tables.

Class Parser.CSV

Description

This is a parser for line oriented data that is either comma, semi-colon or tab separated. It extends the functionality of the Parser.Tabular with some specific functionality related to a header and record oriented parsing of huge datasets.

We document only the differences with the basic Parser.Tabular.

See also

Parser.Tabular


Methodfetchrecord

mappingfetchrecord(void|array|mappingformat)

Description

This function consumes a single record from the input. To be used in conjunction with parsehead().

Returns

It returns the mapping describing the record.

See also

parsehead(), fetch()


InheritTabular

inherit Parser.Tabular : Tabular


Methodparsehead

intparsehead(void|stringdelimiters, void|string|objectmatchfieldname)

Description

This function consumes the header-line preceding a typical comma, semicolon or tab separated value list and autocompiles a format description from that. After this function has successfully parsed a header-line, you can proceed with either fetchrecord() or fetch() to get the remaining records.

Parameter delimiters

Explicitly specify a string containing all the characters that should be considered field delimiters. If not specified or empty, the function will try to autodetect the single delimiter in use.

Parameter matchfieldname

A string containing a regular expression, using Regexp.SimpleRegexp syntax, or an object providing a Regexp.SimpleRegexp.match() single string argument compatible method, that must match all the individual fieldnames before the header will be considered valid.

Returns

It returns true if a CSV head has successfully been parsed.

See also

fetchrecord(), fetch(), compile()

Class Parser.RCS

Description

A RCS file parser that eats a RCS *,v file and presents nice pike data structures of its contents.


Variableaccess

array(string) Parser.RCS.access

Description

The usernames listed in the ACCESS section of the RCS file.


Variablebranch

string|int(0..0) Parser.RCS.branch

Description

The default branch (or revision), if present, 0 otherwise.


Variablebranches

mapping(string:string) Parser.RCS.branches

Description

Maps branch numbers (indices) to branch names (values).

Note

The indices are short branch revision numbers (ie "1.1.2" and not "1.1.0.2").


Variablecomment

string|int(0..0) Parser.RCS.comment

Description

The RCS file comment if present, 0 otherwise.


Methodcreate

Parser.RCSParser.RCS(string|voidfile_name, string|int(0..0)|voidfile_contents, void|intmax_revisions)

Description

Initializes the RCS object.

Parameter file_name

The path to the raw RCS file (includes trailing ",v"). Used mainly for error reporting (truncated RCS file or similar). Stored in rcs_file_name.

Parameter file_contents

If a string is provided, that string will be parsed to initialize the RCS object. If a zero (0) is sent, no initialization will be performed at all. If no value is given at all, but file_name was provided, that file will be loaded and parsed for object initialization.

Parameter max_revisions

Maximum number of revisions to process. If unset, all revisions will be processed.


Variabledescription

string Parser.RCS.description

Description

The RCS file description.


Variableexpand

string Parser.RCS.expand

Description

The keyword expansion options (as named by RCS) if present, 0 otherwise.


Methodexpand_keywords_for_revision

stringexpand_keywords_for_revision(string|Revisionrev, string|voidtext, int|voidexpansion_mode)

Description

Expand keywords and return the resulting text according to the expansion rules set for the file.

Parameter rev

The revision to apply the expansion for.

Parameter text

If supplied, substitute keywords for that text instead using values that would apply for the given revision. Otherwise, revision rev is used.

Parameter expansion_mode

Expansion mode

1

Perform expansion even if the file was checked in as binary.

0

Perform expansion only if the file was checked in as non-binary with expansion enabled.

-1

Perform contraction if the file was checked in as non-binary.

Note

The Log keyword (which lacks sane quoting rules) is not expanded. Keyword expansion rules set in CVSROOT/cvswrappers are ignored. Only implements the -kkv, -ko and -kb expansion modes.

Note

Does not perform any line-ending conversion.

See also

get_contents_for_revision


Methodget_contents_for_revision

stringget_contents_for_revision(string|Revisionrev, void|booldont_cache_data)

Description

Returns the file contents from the revision rev, without performing any keyword expansion. If dont_cache_data is set we will not keep intermediate revisions in memory unless they already existed. This will cut down memory use at the expense of slow access to older revisions.

See also

expand_keywords_for_revision()


Variablehead

string Parser.RCS.head

Description

Version number of the head version of the file.


Inherit_RCS

inherit Parser._RCS : _RCS


Variablelocks

mapping(string:string) Parser.RCS.locks

Description

Maps from username to revision for users that have acquired locks on this file.


Constantmax_revisions_supported

constantint Parser.RCS.max_revisions_supported

Description

Feature detection constant for the max_revisions argument to create(), parse() and parse_delta_sections().


Methodparse

this_programparse(arrayraw, void|function(string:void) progress_callback, void|intmax_revisions)

Description

Parse the RCS file raw and initialize all members of this object fully initialized.

Parameter raw

The unprocessed RCS file.

Parameter progress_callback

Passed on to parse_deltatext_sections.

Parameter max_revisions

Maximum number of revisions to process. If unset, all revisions will be processed.

Returns

The fully initialized object (only returned for API convenience; the object itself is destructively modified to match the data extracted from raw)

See also

parse_admin_section, parse_delta_sections, parse_deltatext_sections, create


Methodparse_admin_section

arrayparse_admin_section(string|arrayraw)

Description

Lower-level API function for parsing only the admin section (the initial chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After running parse_admin_section, the RCS object will be initialized with the values for head, branch, access, branches, tokenize, tags, locks, strict_locks, comment and expand.

Parameter raw

The tokenized RCS file, or the raw RCS-file data.

Returns

The rest of the RCS file, admin section removed.

See also

parse_delta_sections, parse_deltatext_sections, parse, create

FIXME

Does not handle rcsfile(5) newphrase skipping.


Methodparse_delta_sections

arrayparse_delta_sections(arrayraw, void|intmax_revisions)

Description

Lower-level API function for parsing only the delta sections (the second chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After running parse_delta_sections, the RCS object will be initialized with the value of description and populated revisions mapping and trunk array. Their Revision members are however only populated with the members Revision->revision, Revision->branch, Revision->time, Revision->author, Revision->state, Revision->branches, Revision->rcs_next, Revision->ancestor and Revision->next.

Parameter raw

The tokenized RCS file, with admin section removed. (See parse_admin_section.)

Parameter max_revisions

Maximum number of revisions to process. If unset, all revisions will be processed.

Returns

The rest of the RCS file, delta sections removed.

See also

parse_admin_section, tokenize, parse_deltatext_sections, parse, create

FIXME

Does not handle rcsfile(5) newphrase skipping.


Methodparse_deltatext_sections

voidparse_deltatext_sections(arrayraw, void|function(string:void) progress_callback, array|voidcallback_args)

Description

Lower-level API function for parsing only the deltatext sections (the final and typically largest chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After a parse_deltatext_sections run, the RCS object will be fully populated.

Parameter raw

The tokenized RCS file, with admin and delta sections removed. (See parse_admin_section, tokenize and parse_delta_sections.)

Parameter progress_callback

This optional callback is invoked with the revision of the deltatext about to be parsed (useful for progress indicators).

Parameter args

Optional extra trailing arguments to be sent to progress_callback

See also

parse_admin_section, parse_delta_sections, parse, create

FIXME

Does not handle rcsfile(5) newphrase skipping.


Variablercs_file_name

string Parser.RCS.rcs_file_name

Description

The filename of the RCS file as sent to create().


Variablerevisions

mapping(string:Revision) Parser.RCS.revisions

Description

Data for all revisions of the file. The indices of the mapping are the revision numbers, whereas the values are the data from the corresponding revision.


Variablestrict_locks

bool Parser.RCS.strict_locks

Description

1 if strict locking is set, 0 otherwise.


Variabletags

mapping(string:string) Parser.RCS.tags

Description

Maps tag names (indices) to tagged revision numbers (values).

Note

This mapping typically contains raw revision numbers for branches (ie "1.1.0.2" and not "1.1.2").


Methodtokenize

array(array(string)) tokenize(stringdata)

Description

Tokenize an RCS file into tokens suitable as argument to the various parse functions

Parameter data

The RCS file data

Returns

An array with arrays of tokens


Variabletrunk

array(Revision) Parser.RCS.trunk

Description

Data for all revisions on the trunk, sorted in the same order as the RCS file stored them - ie descending, most recent first, I'd assume (rcsfile(5), of course, fails to state such irrelevant information).

Class Parser.RCS.DeltatextIterator

Description

Iterator for the deltatext sections of the RCS file. Typical usage:

Example

string raw = Stdio.read_file(my_rcs_filename); Parser.RCS rcs = Parser.RCS(my_rcs_filename, 0); raw = rcs->parse_delta_sections(rcs->parse_admin_section(raw)); foreach(rcs->DeltatextIterator(raw); int n; Parser.RCS.Revision rev) do_something(rev);


Method`!

bool res = !Parser.RCS.DeltatextIterator()

Returns

1 if the iterator has processed all deltatext entries, 0 otherwise.


Method`+=

Parser.RCS.DeltatextIterator() += nsteps

Description

Advance nsteps sections.

Returns

Returns the iterator object.


Methodcreate

Parser.RCS.DeltatextIteratorParser.RCS.DeltatextIterator(arraydeltatext_section, void|function(string, mixed ... :void) progress_callback, void|array(mixed) progress_callback_args)

Parameter deltatext_section

the deltatext section of the RCS file in its entirety

Parameter progress_callback

This optional callback is invoked with the revision of the deltatext about to be parsed (useful for progress indicators).

Parameter progress_callback_args

Optional extra trailing arguments to be sent to progress_callback

See also

the rcsfile(5) manpage outlines the sections of an RCS file


Methodfirst

boolfirst()

Description

Restart not implemented; always returns 0 (==failed)


Methodindex

intindex()

Returns

the number of deltatext entries processed so far (0..N-1, N being the total number of revisions in the rcs file)


Syntax

int Parser.RCS.DeltatextIterator.nprotectedboolread_next()

Description

Drops the leading whitespace before next revision's deltatext entry and sets this_rev to the revision number we're about to read.


Methodnext

boolnext()

Description

like `+=(1), but returns 0 if the iterator is finished


Methodparse_deltatext_section

protectedintparse_deltatext_section(arrayraw, into)

Description

Chops off the first deltatext section from the token array raw and returns the rest of the string, or the value 0 (zero) if we had already visited the final deltatext entry. The deltatext's data is stored destructively in the appropriate entry of the revisions array.

Note

raw+o must start with a deltatext entry for this method to work

FIXME

does not handle rcsfile(5) newphrase skipping

FIXME

if the rcs file is truncated, this method writes a descriptive error to stderr and then returns 0 - some nicer error handling wouldn't hurt


Methodvalue

Revisionvalue()

Returns

the Revision at whose deltatext data we are, updated with its info

Class Parser.RCS.Revision

Description

All data tied to a particular revision of the file.


Variableadded

int Parser.RCS.Revision.added

Description

The number of lines that were added from the previous revision to make this revision (for the initial revision too).

See also

lines, removed


Variableancestor

string Parser.RCS.Revision.ancestor

Description

The revision of the ancestor of this revision, or 0 if this was the initial revision.

See also

next


Variableauthor

string Parser.RCS.Revision.author

Description

The userid of the user that committed the revision.


Variablebranch

string Parser.RCS.Revision.branch

Description

The branch name on which this revision was committed (calculated according to how cvs manages branches).


Variablebranches

array(string) Parser.RCS.Revision.branches

Description

When there are branches from this revision, an array with the first revision number for each of the branches, otherwise 0.

Follow the next fields to get to the branch head.


Variablelines

int Parser.RCS.Revision.lines

Description

The number of lines this revision contained, altogether (not of particular interest for binary files).

See also

added, removed


Variablelog

string Parser.RCS.Revision.log

Description

The log message associated with the revision.


Variablenext

string Parser.RCS.Revision.next

Description

The revision that succeeds this revision, or 0 if none exists (ie if this is the HEAD of the trunk or of a branch).

See also

ancestor


Variablercs_next

string Parser.RCS.Revision.rcs_next

Description

The revision stored next in the RCS file, or 0 if none exists.

Note

This field is straight from the RCS file, and has somewhat weird semantics. Usually you will want to use one of the derived fields next or prev or possibly rcs_prev.

See also

next, prev, rcs_prev


Variablercs_prev

string Parser.RCS.Revision.rcs_prev

Description

The revision that this revision is based on, or 0 if it is the HEAD.

This is the reverse pointer of rcs_next and branches, and is used by get_contents_for_revision() when applying the deltas to set text.

See also

rcs_next


Variablercs_text

string Parser.RCS.Revision.rcs_text

Description

The raw delta as stored in the RCS file.

See also

text, get_contents_for_revision()


Variableremoved

int Parser.RCS.Revision.removed

Description

The number of lines that were removed from the previous revision to make this revision.

See also

lines, added


Variablerevision

string Parser.RCS.Revision.revision

Description

The revision number (i e rcs_file->revisions["1.1"]->revision == "1.1").


Variablestate

string Parser.RCS.Revision.state

Description

The state of the revision - typically "Exp" or "dead".


Variabletext

string Parser.RCS.Revision.text

Description

The text as committed or 0 if get_contents_for_revision() hasn't been called for this revision yet.

Typically you don't access this field directly, but use get_contents_for_revision() to retrieve it.

See also

get_contents_for_revision(), rcs_text


Variabletime

Calendar.TimeRange Parser.RCS.Revision.time

Description

The (UTC) date and time when the revision was committed (second precision).

Class Parser.SGML

Description

This is a handy simple parser of SGML-like syntax like HTML. It doesn't do anything advanced, but finding the corresponding end-tags.

It's used like this:

<span class='type'>array</span> res<span class='delim'>=</span><span class='ns'>Parser</span><span class='delim'>.</span>SGML<span class='delim'>(</span><span class='delim'>)</span>->feed<span class='delim'>(</span><span class='type'>string</span><span class='delim'>)</span>->finish<span class='delim'>(</span><span class='delim'>)</span>->result<span class='delim'>(</span><span class='delim'>)</span><span class='delim'>;</span>

The resulting structure is an array of atoms, where the atom can be a string or a tag. A tag contains a similar array, as data.

Example

A string "<gat>&nbsp;<gurka>&nbsp;</gurka>&nbsp;<banan>&nbsp;<kiwi>&nbsp;</gat>" results in

<span class='delim'>(</span><span class='delim'>{</span>
   tag <span class='string'>"gat"</span> <span class='type'>object</span> with data<span class='delim'>:</span>
   <span class='delim'>(</span><span class='delim'>{</span>
       tag <span class='string'>"gurka"</span> <span class='type'>object</span> with data<span class='delim'>:</span>
       <span class='delim'>(</span><span class='delim'>{</span>
           <span class='string'>" "</span>
       <span class='delim'>}</span><span class='delim'>)</span>
       tag <span class='string'>"banan"</span> <span class='type'>object</span> with data<span class='delim'>:</span>
       <span class='delim'>(</span><span class='delim'>{</span>
           <span class='string'>" "</span>
           tag <span class='string'>"kiwi"</span> <span class='type'>object</span> with data<span class='delim'>:</span>
           <span class='delim'>(</span><span class='delim'>{</span>
              <span class='string'>" "</span>
           <span class='delim'>}</span><span class='delim'>)</span>
       <span class='delim'>}</span><span class='delim'>)</span>
   <span class='delim'>}</span><span class='delim'>)</span>
<span class='delim'>}</span><span class='delim'>)</span>

ie, simple "tags" (not containers) are not detected, but containers are ended implicitely by a surrounding container _with_ an end tag.

The 'tag' is an object with the following variables:

	 string name;           - name of tag
	 mapping args;          - argument to tag
	 int line,char,column;  - position of tag
	 int eline,echar,ecolumn;  - end position of tag, src[char..echar-1] got the block. add by Xuesong Guo
	 string file;           - filename (see <ref>create</ref>)
	 array(SGMLatom) data;  - contained data
	 int open;		- is not an empty element and has no end tag. add by Xuesong Guo
     

Methodcreate

Parser.SGMLParser.SGML()
Parser.SGMLParser.SGML(stringfilename, function(:void)|voidname_formater, function(:void)|voidargname_formater)

Description

This object is created with this filename. It's passed to all created tags, for debug and trace purposes. All tag name will be replace as name_formater(name) All arg_name will be replace as argname_formater(arg_name)

Note

No, it doesn't read the file itself. See feed().


Methodfeed
Methodfinish
Methodresult

objectfeed(strings)
array(SGMLatom|string) finish()
array(SGMLatom|string) result(strings)

Description

Feed new data to the object, or finish the stream. No result can be used until finish() is called.

Both finish() and result() return the computed data.

feed() returns the called object.


Variablefile

string Parser.SGML.file

Class Parser.SGML.SGMLatom


Variablename
Variableargs
Variableline
Variablechar
Variablecolumn
Variableeline
Variableechar
Variableecolumn
Variablefile
Variabledata
Variableopen

string Parser.SGML.SGMLatom.name
mapping Parser.SGML.SGMLatom.args
int Parser.SGML.SGMLatom.line
int Parser.SGML.SGMLatom.char
int Parser.SGML.SGMLatom.column
int Parser.SGML.SGMLatom.eline
int Parser.SGML.SGMLatom.echar
int Parser.SGML.SGMLatom.ecolumn
string Parser.SGML.SGMLatom.file
array(SGMLatom) Parser.SGML.SGMLatom.data
int Parser.SGML.SGMLatom.open

Class Parser.Tabular

Description

This is a parser for line and block oriented data. It provides a flexible yet concise record-description language to parse character/column/delimiter-organised records.

See also

Parser.LR, http://www.wikipedia.org/wiki/Comma-separated_values, http://www.wikipedia.org/wiki/EDIFACT


Methodcompile

array|mappingcompile(string|Stdio.File|Stdio.FILEinput)

Description

Compiles the format description language into a compiled structure that can be fed to setformat, fetch, or create.

  • The format description is case sensitive.

  • The format description starts with a single line containing: [Tabular description begin]

  • The format description ends with a single line containing: [Tabular description end]

  • Any lines before the startline are skipped.

  • Any lines after the endline are not consumed.

  • Empty lines are skipped.

  • Comments start after a # or ;.

  • The depth level of a field is indicated by the number of leading spaces or colons at the beginning of the line.

  • The fieldname must not contain any whitespace.

  • An arbitrary number of single character field delimiters can be specified between brackets, e.g. [,;] or [,] would be for CSV.

  • When field delimiters are being used: in case of CSV type delimiters [\t,; ] the standard CSV quoting rules apply, in case other delimiters are used, no quoting is supported and the last field on a line should not specify a delimiter, but should specify a 0 fieldwidth instead.

  • A fixed field width can be specified by a plain decimal integer, a value of 0 indicates a field with arbitrary length that extends till the end of the line.

  • A matching regular expression can be enclosed in "", it has to match the complete field content and uses Regexp.SimpleRegexp syntax.

  • On records the following options are supported:

    mandatory

    This record is required.

    fold

    Fold this record's contents in the enclosing record.

    single

    This record is present at most once.

  • On fields the following options are supported:

    drop

    After reading and matching this field, drop the field content from the resulting mappingstructure.

See also

setformat(), create(), fetch()

Example

Example of the description language:

[Tabular description begin] csv :gtz ::mybankno [,] ::transferdate [,] ::mutatiesoort [,] ::volgnummer [,] ::bankno [,] ::name [,] ::kostenplaats [,] drop ::amount [,] ::afbij [,] ::mutatie [,] ::reference [,] ::valutacode [,] mt940 :messageheader1 mandatory ::exporttime "0000" drop ::CS1 " " drop ::exportday "01" drop ::exportaddress 12 ::exportnumber 5 "[0-9]+" :messageheader3 mandatory fold single ::messagetype "940" drop ::CS1 " " drop ::messagepriority "00" drop :TRN fold ::tag ":20:" drop ::reference "GTZPB|MPBZ|INGEB" :accountid fold ::tag ":25:" drop ::accountno 10 :statementno fold ::tag ":28C:" drop ::settlementno 0 drop :openingbalance mandatory single ::tag ":60F:" drop ::creditdebit 1 ::date 6 ::currency "EUR" ::amount 0 "[0-9]+,[0-9][0-9]" :statements ::statementline mandatory fold single :::tag ":61:" drop :::valuedate 6 :::creditdebit 1 :::amount "[0-9]+,[0-9][0-9]" :::CS1 "N" drop :::transactiontype 3 # 3 for Postbank, 4 for ING :::paymentreference 0 ::informationtoaccountowner fold single :::tag ":86:" drop :::accountno "[0-9]*( |)" :::accountname 0 ::description fold :::description 0 "|[^:].*" :closingbalance mandatory single ::tag ":62[FM]:" drop ::creditdebit 1 ::date 6 ::currency "EUR" ::amount 0 "[0-9]+,[0-9][0-9]" :informationtoaccountowner fold single ::tag ":86:" drop ::debit "D" drop ::debitentries 6 ::credit "C" drop ::creditentries 6 ::debit "D" drop ::debitamount "[0-9]+,[0-9][0-9]" ::credit "C" drop ::creditamount "[0-9]+,[0-9][0-9]" drop ::accountname "(\n[^-:][^\n]*)*" drop :messagetrailer mandatory single ::start "-" ::end "XXX" [Tabular description end]


Methodcreate

Parser.TabularParser.Tabular(void|string|Stdio.File|Stdio.FILEinput, void|array|mapping|string|Stdio.File|Stdio.FILEformat, void|intverbose)

Description

This function initialises the parser.

Parameter input

The input stream or string.

Parameter format

The format to be used (either precompiled or not). The format description language is documented under compile().

Parameter verbose

If >1, it specifies the number of characters to display of the beginning of each record as a progress indicator. Special values are:

-4

Turns on format debugging with visible mismatches.

-3

Turns on format debugging with named field contents.

-2

Turns on format debugging with field contents.

-1

Turns on basic format debugging.

0

Turns off verbosity. Default.

1

Is the same as setting it to 70.

See also

compile(), setformat(), fetch()


Methodfeed

objectfeed(stringcontent)

Parameter content

Is injected into the input stream.

Returns

This object.

See also

fetch()


Methodfetch

mappingfetch(void|array|mappingformat)

Description

This function consumes as much input as needed to parse the full tabular structures at once.

Parameter format

Describes (precompiled only) formats to be parsed. If no format is specified, the format specified on create() is used, and empty lines are automatically skipped.

Returns

A nested mapping that contains the complete structure as described in the specified format.

If nothing matches the specified format, no input is consumed (except empty lines, if the default format is used), and zero is returned.

See also

compile(), create(), setformat(), skipemptylines()


Methodsetformat

array|mappingsetformat(array|mappingformat)

Parameter format

Replaces the default (precompiled only) format.

Returns

The previous default format.

See also

compile(), fetch()


Methodskipemptylines

intskipemptylines()

Description

This function can be used to manually skip empty lines in the input. This is unnecessary if no argument is specified for fetch().

Returns

It returns true if EOF has been reached.

See also

fetch()

Module Parser.C


Methodgroup

array(Token|array) group(array(string|Token) tokens, void|mapping(string:string) groupings)

Description

Fold sub blocks of an array of tokens into sub arrays, for grouping purposes.

Parameter tokens

The token array to fold.

Parameter groupings

Supplies the tokens marking the boundaries of blocks to fold. The indices of the mapping mark the start of a block, the corresponding values mark where the block ends. The sub arrays will start and end in these tokens. If no groupings mapping is provided, {}, () and [] are used as block boundaries.


Methodhide_whitespaces

arrayhide_whitespaces(arraytokens)

Description

Folds all whitespace tokens into the previous token's trailing_whitespaces.


Methodreconstitute_with_line_numbers

stringreconstitute_with_line_numbers(array(string|Token|array) tokens)

Description

Like simple_reconstitute, but adding additional #line n "file" preprocessor statements in the output whereever a new line or file starts.


Methodsimple_reconstitute

stringsimple_reconstitute(array(string|Token|array) tokens)

Description

Reconstitutes the token array into a plain string again; essentially reversing split() and whichever of the tokenize, group and hide_whitespaces methods may have been invoked.


Methodsplit

array(string) split(stringdata, void|mapping(string:string) state)

Description

Splits the data string into an array of tokens. An additional element with a newline will be added to the resulting array of tokens. If the optional argument state is provided the split function is able to pause and resume splitting inside #"" and /**/ tokens. The state argument should be an initially empty mapping, in which split will store its state between successive calls.


Methodstrip_line_statements

array(Token|array) strip_line_statements(array(Token|array) tokens)

Description

Strips off all (preprocessor) line statements from a token array.


Methodtokenize

array(Token) tokenize(array(string) s, void|stringfile)

Description

Returns an array of Token objects given an array of string tokens.

Class Parser.C.Token

Description

Represents a C token, along with a selection of associated data and operations.


Method_sprintf

stringsprintf(stringformat, ... Parser.C.Tokenarg ... )

Description

If the object is printed as %s it will only output its text contents.


Method`+

string res = Parser.C.Token() + s

Description

A string can be added to the Token, which will be added to the text contents.


Method`==

int res = Parser.C.Token() == foo

Description

Tokens are considered equal if the text contents are equal. It is also possible to compare the Token object with a text string directly.


Method`[]

int|string res = Parser.C.Token()[ a ]

Description

Characters and ranges may be indexed from the text contents of the token.


Method``+

string res = s + Parser.C.Token()

Description

A string can be added to the Token, which will be added to the text contents.


Methodcast

(int)Parser.C.Token()
(float)Parser.C.Token()
(string)Parser.C.Token()
(array)Parser.C.Token()
(mapping)Parser.C.Token()
(multiset)Parser.C.Token()

Description

It is possible to case a Token object to a string. The text content will be returned.


Methodcreate

Parser.C.TokenParser.C.Token(stringtext, void|intline, void|stringfile, void|stringtrailing_whitespace)


Variablefile

string Parser.C.Token.file

Description

The file in which the token was found.


Variableline

int Parser.C.Token.line

Description

The line where the token was found.


Variabletext

string Parser.C.Token.text

Description

The actual token.


Variabletrailing_whitespaces

string Parser.C.Token.trailing_whitespaces

Description

Trailing whitespaces.

Class Parser.C.UnterminatedStringError

Description

Error thrown when an unterminated string token is encountered.


Variableerr_str

string Parser.C.UnterminatedStringError.err_str

Description

The string that failed to be tokenized


InheritGeneric

inherit Error.Generic : Generic

Module Parser.LR

Description

LALR(1) parser generator.

Enum Parser.LR.SeverityLevel

Description

Severity level


ConstantNOTICE
ConstantWARNING
ConstantERROR

constant Parser.LR.NOTICE
constant Parser.LR.WARNING
constant Parser.LR.ERROR

Class Parser.LR.ErrorHandler

Description

Class handling reporting of errors and warnings.


Methodcreate

Parser.LR.ErrorHandlerParser.LR.ErrorHandler(int(-1..1)|voidverbosity)

Description

Create a new error handler.

Parameter verbosity

Level of verbosity.

See also

verbose


Variableverbose

optionalint(-1..1) Parser.LR.ErrorHandler.verbose

Description

Verbosity level

-1

Just errors.

0

Errors and warnings.

1

Also notices.

Class Parser.LR.Parser

Description

This object implements an LALR(1) parser and compiler.

Normal use of this object would be:

 set_error_handler
 {add_rule, set_priority, set_associativity}*
 set_symbol_to_string
 compile
 {parse}*
 

Method_sprintf

stringsprintf(stringformat, ... Parser.LR.Parserarg ... )

Description

Pretty-prints the current grammar to a string.


Methodadd_rule

voidadd_rule(Ruler)

Description

Add a rule to the grammar.

Parameter r

Rule to add.


Methodcast

(int)Parser.LR.Parser()
(float)Parser.LR.Parser()
(string)Parser.LR.Parser()
(array)Parser.LR.Parser()
(mapping)Parser.LR.Parser()
(multiset)Parser.LR.Parser()

Description

Implements casting.

Parameter type

Type to cast to.


Methodcompile

intcompile()

Description

Compiles the grammar into a parser, so that parse() can be called.


Variableerror_handler

function(SeverityLevel, string, string, mixed ... :void) Parser.LR.Parser.error_handler

Description

Compile error and warning handler.


Variablegrammar

mapping(int:array(Rule)) Parser.LR.Parser.grammar

Description

The grammar itself.


Methoditem_to_string

stringitem_to_string(Itemi)

Description

Pretty-prints an item to a string.

Parameter i

Item to pretty-print.


Variableknown_states

mapping(string:Kernel) Parser.LR.Parser.known_states

Description

LR0 states that are already known to the compiler.


Variablelr_error

int Parser.LR.Parser.lr_error

Description

Error code


Methodparse

mixedparse(object|function(void:string|array(string|mixed)) scanner, void|objectaction_object)

Description

Parse the input according to the compiled grammar. The last value reduced is returned.

Note

The parser must have been compiled (with compile()) prior to calling this function.

Bugs

Errors should be throw()n.

Parameter scanner

The scanner function. It returns the next symbol from the input. It should either return a string (terminal) or an array with a string (terminal) and a mixed (value). EOF is indicated with the empty string.

Parameter action_object

Object used to resolve those actions that have been specified as strings.


Methodrule_to_string

stringrule_to_string(Ruler)

Description

Pretty-prints a rule to a string.

Parameter r

Rule to print.


Variables_q

StateQueue Parser.LR.Parser.s_q

Description

Contains all states used. In the queue section are the states that remain to be compiled.


Methodset_associativity

voidset_associativity(stringterminal, intassoc)

Description

Sets the associativity of a terminal.

Parameter terminal

Terminal to set the associativity for.

Parameter assoc

Associativity; negative - left, positive - right, zero - no associativity.


Methodset_error_handler

voidset_error_handler(void|function(SeverityLevel, string, string, mixed ... :void) handler)

Description

Sets the error report function.

Parameter handler

Function to call to report errors and warnings. If zero or not specifier, use the built-in function.


Methodset_priority

voidset_priority(stringterminal, intpri_val)

Description

Sets the priority of a terminal.

Parameter terminal

Terminal to set the priority for.

Parameter pri_val

Priority; higher = prefer this terminal.


Methodset_symbol_to_string

voidset_symbol_to_string(void|function(int|string:string) s_to_s)

Description

Sets the symbol to string conversion function. The conversion function is used by the various *_to_string functions to make comprehensible output.

Parameter s_to_s

Symbol to string conversion function. If zero or not specified, use the built-in function.


Variablestart_state

Kernel Parser.LR.Parser.start_state

Description

The initial LR0 state.


Methodstate_to_string

stringstate_to_string(Kernelstate)

Description

Pretty-prints a state to a string.

Parameter state

State to pretty-print.

Class Parser.LR.Parser.Item

Description

An LR(0) item, a partially parsed rule.


Variablecounter

int Parser.LR.Parser.Item.counter

Description

Depth counter (used when compiling).


Variabledirect_lookahead

multiset(string) Parser.LR.Parser.Item.direct_lookahead

Description

Look-ahead set for this item.


Variableerror_lookahead

multiset(string) Parser.LR.Parser.Item.error_lookahead

Description

Look-ahead set used for detecting conflicts


Variableitem_id

int Parser.LR.Parser.Item.item_id

Description

Used to identify the item. Equal to r->number + offset.


Variablemaster_item

Item Parser.LR.Parser.Item.master_item

Description

Item representing this one (used for shifts).


Variablenext_state

Kernel Parser.LR.Parser.Item.next_state

Description

The state we will get if we shift according to this rule


Variablenumber

int Parser.LR.Parser.Item.number

Description

Item identification number (used when compiling).


Variableoffset

int Parser.LR.Parser.Item.offset

Description

How long into the rule the parsing has come.


Variabler

Rule Parser.LR.Parser.Item.r

Description

The rule


Variablerelation

multiset(Item) Parser.LR.Parser.Item.relation

Description

Relation to other items (used when compiling).

Class Parser.LR.Parser.Kernel

Description

Implements an LR(1) state


Variableaction

mapping(int|string:Kernel|Rule) Parser.LR.Parser.Kernel.action

Description

The action table for this state

 object(kernel)    SHIFT to this state on this symbol.
 object(rule)      REDUCE according to this rule on this symbol.
 

Methodadd_item

voidadd_item(Itemi)

Description

Add an item to the state.


Methodclosure

voidclosure(intnonterminal)

Description

Make the closure of this state.

Parameter nonterminal

Nonterminal to make the closure on.


Variableclosure_set

multiset Parser.LR.Parser.Kernel.closure_set

Description

The symbols that closure has been called on.


Methoddo_goto

Kerneldo_goto(int|stringsymbol)

Description

Generates the state reached when doing goto on the specified symbol. i.e. it compiles the LR(0) state.

Parameter symbol

Symbol to make goto on.


Methodgoto_set

multiset(int|string) goto_set()

Description

Make the goto-set of this state.


Variableitem_id_to_item

mapping(int:Item) Parser.LR.Parser.Kernel.item_id_to_item

Description

Used to lookup items given rule and offset


Variableitems

array(Item) Parser.LR.Parser.Kernel.items

Description

Contains the items in this state.


Variablerules

multiset(Rule) Parser.LR.Parser.Kernel.rules

Description

Used to check if a rule already has been added when doing closures.


Variablesymbol_items

mapping(int:multiset(Item)) Parser.LR.Parser.Kernel.symbol_items

Description

Contains the items whose next symbol is this non-terminal.

Class Parser.LR.Parser.StateQueue

Description

This is a queue, which keeps the elements even after they are retrieved.


Variablearr

array(Kernel) Parser.LR.Parser.StateQueue.arr

Description

The queue itself.


Variablehead

int Parser.LR.Parser.StateQueue.head

Description

Index of the head of the queue.


Methodnext

Kernelnext()

Description

Return the next state from the queue.


Methodpush

Kernelpush(Kernelstate)

Description

Pushes the state on the queue.

Parameter state

State to push.


Variabletail

int Parser.LR.Parser.StateQueue.tail

Description

Index of the tail of the queue.

Class Parser.LR.Priority

Description

Specifies the priority and associativity of a rule.


Variableassoc

int Parser.LR.Priority.assoc

Description

Associativity

-1

Left

0

None

1

Right


Methodcreate

Parser.LR.PriorityParser.LR.Priority(intp, inta)

Description

Create a new priority object.

Parameter p

Priority.

Parameter a

Associativity.


Variablevalue

int Parser.LR.Priority.value

Description

Priority value

Class Parser.LR.Rule

Description

This object is used to represent a BNF-rule in the LR parser.


Variableaction

function(:void)|string Parser.LR.Rule.action

Description

Action to do when reducing this rule. function - call this function. string - call this function by name in the object given to the parser. The function is called with arguments corresponding to the values of the elements of the rule. The return value of the function will be the value of this non-terminal. The default rule is to return the first argument.


Methodcreate

Parser.LR.RuleParser.LR.Rule(intnt, array(string|int) r, function(:void)|string|voida)

Description

Create a BNF rule.

Example

The rule

rule : nonterminal ":" symbols ";" { add_rule };

might be created as

rule(4, ({ 9, ":", 5, ";" }), "add_rule");

where 4 corresponds to the nonterminal "rule", 9 to "nonterminal" and 5 to "symbols", and the function "add_rule" is too be called when this rule is reduced.

Parameter nt

Non-terminal to reduce to.

Parameter r

Symbol sequence that reduces to nt.

Parameter a

Action to do when reducing according to this rule. function - Call this function. string - Call this function by name in the object given to the parser. The function is called with arguments corresponding to the values of the elements of the rule. The return value of the function will become the value of this non-terminal. The default rule is to return the first argument.


Variablehas_tokens

int Parser.LR.Rule.has_tokens

Description

This rule contains tokens


Variablenonterminal

int Parser.LR.Rule.nonterminal

Description

Non-terminal this rule reduces to.


Variablenum_nonnullables

int Parser.LR.Rule.num_nonnullables

Description

This rule has this many non-nullable symbols at the moment.


Variablenumber

int Parser.LR.Rule.number

Description

Sequence number of this rule (used for conflict resolving) Also used to identify the rule.


Variablepri

Priority Parser.LR.Rule.pri

Description

Priority and associativity of this rule.


Variablesymbols

array(string|int) Parser.LR.Rule.symbols

Description

The actual rule

Module Parser.LR.GrammarParser

Description

This module generates an LR parser from a grammar specified according to the following grammar:

        directives : directive ;
	   directives : directives directive ;
	   directive : declaration ;
	   directive : rule ;
	   declaration : "%token" terminals ";" ;
	   rule : nonterminal ":" symbols ";" ;
	   rule : nonterminal ":" symbols action ";" ;
	   symbols : symbol ;
	   symbols : symbols symbol ;
	   terminals : terminal ;
	   terminals : terminals terminal ;
	   symbol : nonterminal ;
	   symbol : "string" ;
	   action : "{" "identifier" "}" ;
	   nonterminal : "identifier" ;
	   terminal : "string";
 

Variablelr_error

int Parser.LR.GrammarParser.lr_error

Description

Error code from the parsing.


Methodmake_parser

Parsermake_parser(stringstr, object|voidm)

Description

Compiles the parser-specification given in the first argument. Named actions are taken from the object if available, otherwise left as is.

Bugs

Returns error-code in both GrammarParser.error and return_value->lr_error.


Methodmake_parser_from_file

int|Parsermake_parser_from_file(stringfname, object|voidm)

Description

Compiles the file specified in the first argument into an LR parser.

See also

make_parser

Module Parser.Pike

Description

This module parses and tokenizes Pike source code.


Inherit"C.pmod"

inherit "C.pmod" : "C.pmod"

Module Parser.Python


Methodsplit

array(string) split(stringdata)

Description

Returns the provided string with Python code as an array with tokens.

Module Parser._parser

Description

Low-level helpers for parsers.

Note

You probably don't want to use the modules contained in this module directly, but instead use the other Parser modules. See instead the modules below.

See also

Parser, Parser.C, Parser.Pike, Parser.RCS, Parser.HTML, Parser.XML

Module Parser._parser._C

Description

Low-level helpers for Parser.C.

Note

You probably want to use Parser.C instead of this module.

See also

Parser.C, _Pike.


Methodtokenize

array(array(string)|string) tokenize(stringcode)

Description

Tokenize a string of C tokens.

Note

Don't use this function directly. Use Parser.C.tokenize() instead.

Returns

Returns an array with an array with C-level tokens, and the remainder (a partial token), if any.

Module Parser._parser._Pike

Description

Low-level helpers for Parser.Pike.

Note

You probably want to use Parser.Pike instead of this module.

See also

Parser.Pike, _C.


Methodtokenize

array(array(string)|string) tokenize(stringcode)

Description

Tokenize a string of Pike tokens.

Returns

Returns an array with Pike-level tokens and the remainder (a partial token), if any.

Module Parser._parser._RCS

Description

Low-level helpers for Parser.RCS.

Note

You probably want to use Parser.RCS instead of this module.

See also

Parser.RCS


Methodtokenize

array(array(string)) tokenize(stringcode)

Description

Tokenize a string of RCS tokens.

Note

Don't use this function directly. Use Parser.RCS.tokenize() instead.

See also

Parser.RCS.tokenize()