ai-content-maker/.venv/Lib/site-packages/nltk/test/tree.doctest

1224 lines
45 KiB
Plaintext
Raw Normal View History

2024-05-03 04:18:51 +03:00
.. Copyright (C) 2001-2023 NLTK Project
.. For license information, see LICENSE.TXT
===============================
Unit tests for nltk.tree.Tree
===============================
>>> from nltk.tree import *
Some trees to run tests on:
>>> dp1 = Tree('dp', [Tree('d', ['the']), Tree('np', ['dog'])])
>>> dp2 = Tree('dp', [Tree('d', ['the']), Tree('np', ['cat'])])
>>> vp = Tree('vp', [Tree('v', ['chased']), dp2])
>>> tree = Tree('s', [dp1, vp])
>>> print(tree)
(s (dp (d the) (np dog)) (vp (v chased) (dp (d the) (np cat))))
The node label is accessed using the `label()` method:
>>> dp1.label(), dp2.label(), vp.label(), tree.label()
('dp', 'dp', 'vp', 's')
>>> print(tree[1,1,1,0])
cat
The `treepositions` method returns a list of the tree positions of
subtrees and leaves in a tree. By default, it gives the position of
every tree, subtree, and leaf, in prefix order:
>>> print(tree.treepositions())
[(), (0,), (0, 0), (0, 0, 0), (0, 1), (0, 1, 0), (1,), (1, 0), (1, 0, 0), (1, 1), (1, 1, 0), (1, 1, 0, 0), (1, 1, 1), (1, 1, 1, 0)]
In addition to `str` and `repr`, several methods exist to convert a
tree object to one of several standard tree encodings:
>>> print(tree.pformat_latex_qtree())
\Tree [.s
[.dp [.d the ] [.np dog ] ]
[.vp [.v chased ] [.dp [.d the ] [.np cat ] ] ] ]
There is also a fancy ASCII art representation:
>>> tree.pretty_print()
s
________|_____
| vp
| _____|___
dp | dp
___|___ | ___|___
d np v d np
| | | | |
the dog chased the cat
>>> tree.pretty_print(unicodelines=True, nodedist=4)
s
┌──────────────┴────────┐
│ vp
│ ┌────────┴──────┐
dp │ dp
┌──────┴──────┐ │ ┌──────┴──────┐
d np v d np
│ │ │ │ │
the dog chased the cat
Trees can be initialized from treebank strings:
>>> tree2 = Tree.fromstring('(S (NP I) (VP (V enjoyed) (NP my cookie)))')
>>> print(tree2)
(S (NP I) (VP (V enjoyed) (NP my cookie)))
Trees can be compared for equality:
>>> tree == Tree.fromstring(str(tree))
True
>>> tree2 == Tree.fromstring(str(tree2))
True
>>> tree == tree2
False
>>> tree == Tree.fromstring(str(tree2))
False
>>> tree2 == Tree.fromstring(str(tree))
False
>>> tree != Tree.fromstring(str(tree))
False
>>> tree2 != Tree.fromstring(str(tree2))
False
>>> tree != tree2
True
>>> tree != Tree.fromstring(str(tree2))
True
>>> tree2 != Tree.fromstring(str(tree))
True
>>> tree < tree2 or tree > tree2
True
Tree Parsing
============
The class method `Tree.fromstring()` can be used to parse trees, and it
provides some additional options.
>>> tree = Tree.fromstring('(S (NP I) (VP (V enjoyed) (NP my cookie)))')
>>> print(tree)
(S (NP I) (VP (V enjoyed) (NP my cookie)))
When called on a subclass of `Tree`, it will create trees of that
type:
>>> tree = ImmutableTree.fromstring('(VP (V enjoyed) (NP my cookie))')
>>> print(tree)
(VP (V enjoyed) (NP my cookie))
>>> print(type(tree))
<class 'nltk.tree.immutable.ImmutableTree'>
>>> tree[1] = 'x'
Traceback (most recent call last):
. . .
ValueError: ImmutableTree may not be modified
>>> del tree[0]
Traceback (most recent call last):
. . .
ValueError: ImmutableTree may not be modified
The ``brackets`` parameter can be used to specify two characters that
should be used as brackets:
>>> print(Tree.fromstring('[S [NP I] [VP [V enjoyed] [NP my cookie]]]',
... brackets='[]'))
(S (NP I) (VP (V enjoyed) (NP my cookie)))
>>> print(Tree.fromstring('<S <NP I> <VP <V enjoyed> <NP my cookie>>>',
... brackets='<>'))
(S (NP I) (VP (V enjoyed) (NP my cookie)))
If ``brackets`` is not a string, or is not exactly two characters,
then `Tree.fromstring` raises an exception:
>>> Tree.fromstring('<VP <V enjoyed> <NP my cookie>>', brackets='')
Traceback (most recent call last):
. . .
TypeError: brackets must be a length-2 string
>>> Tree.fromstring('<VP <V enjoyed> <NP my cookie>>', brackets='<<>>')
Traceback (most recent call last):
. . .
TypeError: brackets must be a length-2 string
>>> Tree.fromstring('<VP <V enjoyed> <NP my cookie>>', brackets=12)
Traceback (most recent call last):
. . .
TypeError: brackets must be a length-2 string
>>> Tree.fromstring('<<NP my cookie>>', brackets=('<<','>>'))
Traceback (most recent call last):
. . .
TypeError: brackets must be a length-2 string
(We may add support for multi-character brackets in the future, in
which case the ``brackets=('<<','>>')`` example would start working.)
Whitespace brackets are not permitted:
>>> Tree.fromstring('(NP my cookie\n', brackets='(\n')
Traceback (most recent call last):
. . .
TypeError: whitespace brackets not allowed
If an invalid tree is given to Tree.fromstring, then it raises a
ValueError, with a description of the problem:
>>> Tree.fromstring('(NP my cookie) (NP my milk)')
Traceback (most recent call last):
. . .
ValueError: Tree.fromstring(): expected 'end-of-string' but got '(NP'
at index 15.
"...y cookie) (NP my mil..."
^
>>> Tree.fromstring(')NP my cookie(')
Traceback (most recent call last):
. . .
ValueError: Tree.fromstring(): expected '(' but got ')'
at index 0.
")NP my coo..."
^
>>> Tree.fromstring('(NP my cookie))')
Traceback (most recent call last):
. . .
ValueError: Tree.fromstring(): expected 'end-of-string' but got ')'
at index 14.
"...my cookie))"
^
>>> Tree.fromstring('my cookie)')
Traceback (most recent call last):
. . .
ValueError: Tree.fromstring(): expected '(' but got 'my'
at index 0.
"my cookie)"
^
>>> Tree.fromstring('(NP my cookie')
Traceback (most recent call last):
. . .
ValueError: Tree.fromstring(): expected ')' but got 'end-of-string'
at index 13.
"... my cookie"
^
>>> Tree.fromstring('')
Traceback (most recent call last):
. . .
ValueError: Tree.fromstring(): expected '(' but got 'end-of-string'
at index 0.
""
^
Trees with no children are supported:
>>> print(Tree.fromstring('(S)'))
(S )
>>> print(Tree.fromstring('(X (Y) (Z))'))
(X (Y ) (Z ))
Trees with an empty node label and no children are supported:
>>> print(Tree.fromstring('()'))
( )
>>> print(Tree.fromstring('(X () ())'))
(X ( ) ( ))
Trees with an empty node label and children are supported, but only if the
first child is not a leaf (otherwise, it will be treated as the node label).
>>> print(Tree.fromstring('((A) (B) (C))'))
( (A ) (B ) (C ))
>>> print(Tree.fromstring('((A) leaf)'))
( (A ) leaf)
>>> print(Tree.fromstring('(((())))'))
( ( ( ( ))))
The optional arguments `read_node` and `read_leaf` may be used to
transform the string values of nodes or leaves.
>>> print(Tree.fromstring('(A b (C d e) (F (G h i)))',
... read_node=lambda s: '<%s>' % s,
... read_leaf=lambda s: '"%s"' % s))
(<A> "b" (<C> "d" "e") (<F> (<G> "h" "i")))
These transformation functions are typically used when the node or
leaf labels should be parsed to a non-string value (such as a feature
structure). If node and leaf labels need to be able to include
whitespace, then you must also use the optional `node_pattern` and
`leaf_pattern` arguments.
>>> from nltk.featstruct import FeatStruct
>>> tree = Tree.fromstring('([cat=NP] [lex=the] [lex=dog])',
... read_node=FeatStruct, read_leaf=FeatStruct)
>>> tree.set_label(tree.label().unify(FeatStruct('[num=singular]')))
>>> print(tree)
([cat='NP', num='singular'] [lex='the'] [lex='dog'])
The optional argument ``remove_empty_top_bracketing`` can be used to
remove any top-level empty bracketing that occurs.
>>> print(Tree.fromstring('((S (NP I) (VP (V enjoyed) (NP my cookie))))',
... remove_empty_top_bracketing=True))
(S (NP I) (VP (V enjoyed) (NP my cookie)))
It will not remove a top-level empty bracketing with multiple children:
>>> print(Tree.fromstring('((A a) (B b))'))
( (A a) (B b))
Tree.fromlist()
---------------
The class method `Tree.fromlist()` can be used to parse trees
that are expressed as nested lists, such as those produced by
the tree() function from the wordnet module.
>>> from nltk.corpus import wordnet as wn
>>> t=Tree.fromlist(wn.synset('dog.n.01').tree(lambda s:s.hypernyms()))
>>> print(t.height())
14
>>> print(t.leaves())
["Synset('entity.n.01')", "Synset('entity.n.01')"]
>>> t.pretty_print()
Synset('dog.n.01')
_________________|__________________
Synset('canine.n. |
02') |
| |
Synset('carnivor |
e.n.01') |
| |
Synset('placenta |
l.n.01') |
| |
Synset('mammal.n. |
01') |
| |
Synset('vertebra |
te.n.01') |
| |
Synset('chordate. Synset('domestic
n.01') _animal.n.01')
| |
Synset('animal.n. Synset('animal.n.
01') 01')
| |
Synset('organism. Synset('organism.
n.01') n.01')
| |
Synset('living_t Synset('living_t
hing.n.01') hing.n.01')
| |
Synset('whole.n. Synset('whole.n.
02') 02')
| |
Synset('object.n. Synset('object.n.
01') 01')
| |
Synset('physical Synset('physical
_entity.n.01') _entity.n.01')
| |
Synset('entity.n. Synset('entity.n.
01') 01')
Parented Trees
==============
`ParentedTree` is a subclass of `Tree` that automatically maintains
parent pointers for single-parented trees. Parented trees can be
created directly from a node label and a list of children:
>>> ptree = (
... ParentedTree('VP', [
... ParentedTree('VERB', ['saw']),
... ParentedTree('NP', [
... ParentedTree('DET', ['the']),
... ParentedTree('NOUN', ['dog'])])]))
>>> print(ptree)
(VP (VERB saw) (NP (DET the) (NOUN dog)))
Parented trees can be created from strings using the classmethod
`ParentedTree.fromstring`:
>>> ptree = ParentedTree.fromstring('(VP (VERB saw) (NP (DET the) (NOUN dog)))')
>>> print(ptree)
(VP (VERB saw) (NP (DET the) (NOUN dog)))
>>> print(type(ptree))
<class 'nltk.tree.parented.ParentedTree'>
Parented trees can also be created by using the classmethod
`ParentedTree.convert` to convert another type of tree to a parented
tree:
>>> tree = Tree.fromstring('(VP (VERB saw) (NP (DET the) (NOUN dog)))')
>>> ptree = ParentedTree.convert(tree)
>>> print(ptree)
(VP (VERB saw) (NP (DET the) (NOUN dog)))
>>> print(type(ptree))
<class 'nltk.tree.parented.ParentedTree'>
.. clean-up:
>>> del tree
`ParentedTree`\ s should never be used in the same tree as `Tree`\ s
or `MultiParentedTree`\ s. Mixing tree implementations may result in
incorrect parent pointers and in `TypeError` exceptions:
>>> # Inserting a Tree in a ParentedTree gives an exception:
>>> ParentedTree('NP', [
... Tree('DET', ['the']), Tree('NOUN', ['dog'])])
Traceback (most recent call last):
. . .
TypeError: Can not insert a non-ParentedTree into a ParentedTree
>>> # inserting a ParentedTree in a Tree gives incorrect parent pointers:
>>> broken_tree = Tree('NP', [
... ParentedTree('DET', ['the']), ParentedTree('NOUN', ['dog'])])
>>> print(broken_tree[0].parent())
None
Parented Tree Methods
------------------------
In addition to all the methods defined by the `Tree` class, the
`ParentedTree` class adds six new methods whose values are
automatically updated whenever a parented tree is modified: `parent()`,
`parent_index()`, `left_sibling()`, `right_sibling()`, `root()`, and
`treeposition()`.
The `parent()` method contains a `ParentedTree`\ 's parent, if it has
one; and ``None`` otherwise. `ParentedTree`\ s that do not have
parents are known as "root trees."
>>> for subtree in ptree.subtrees():
... print(subtree)
... print(' Parent = %s' % subtree.parent())
(VP (VERB saw) (NP (DET the) (NOUN dog)))
Parent = None
(VERB saw)
Parent = (VP (VERB saw) (NP (DET the) (NOUN dog)))
(NP (DET the) (NOUN dog))
Parent = (VP (VERB saw) (NP (DET the) (NOUN dog)))
(DET the)
Parent = (NP (DET the) (NOUN dog))
(NOUN dog)
Parent = (NP (DET the) (NOUN dog))
The `parent_index()` method stores the index of a tree in its parent's
child list. If a tree does not have a parent, then its `parent_index`
is ``None``.
>>> for subtree in ptree.subtrees():
... print(subtree)
... print(' Parent Index = %s' % subtree.parent_index())
... assert (subtree.parent() is None or
... subtree.parent()[subtree.parent_index()] is subtree)
(VP (VERB saw) (NP (DET the) (NOUN dog)))
Parent Index = None
(VERB saw)
Parent Index = 0
(NP (DET the) (NOUN dog))
Parent Index = 1
(DET the)
Parent Index = 0
(NOUN dog)
Parent Index = 1
Note that ``ptree.parent().index(ptree)`` is *not* equivalent to
``ptree.parent_index()``. In particular, ``ptree.parent().index(ptree)``
will return the index of the first child of ``ptree.parent()`` that is
equal to ``ptree`` (using ``==``); and that child may not be
``ptree``:
>>> on_and_on = ParentedTree('CONJP', [
... ParentedTree('PREP', ['on']),
... ParentedTree('COJN', ['and']),
... ParentedTree('PREP', ['on'])])
>>> second_on = on_and_on[2]
>>> print(second_on.parent_index())
2
>>> print(second_on.parent().index(second_on))
0
The methods `left_sibling()` and `right_sibling()` can be used to get a
parented tree's siblings. If a tree does not have a left or right
sibling, then the corresponding method's value is ``None``:
>>> for subtree in ptree.subtrees():
... print(subtree)
... print(' Left Sibling = %s' % subtree.left_sibling())
... print(' Right Sibling = %s' % subtree.right_sibling())
(VP (VERB saw) (NP (DET the) (NOUN dog)))
Left Sibling = None
Right Sibling = None
(VERB saw)
Left Sibling = None
Right Sibling = (NP (DET the) (NOUN dog))
(NP (DET the) (NOUN dog))
Left Sibling = (VERB saw)
Right Sibling = None
(DET the)
Left Sibling = None
Right Sibling = (NOUN dog)
(NOUN dog)
Left Sibling = (DET the)
Right Sibling = None
A parented tree's root tree can be accessed using the `root()`
method. This method follows the tree's parent pointers until it
finds a tree without a parent. If a tree does not have a parent, then
it is its own root:
>>> for subtree in ptree.subtrees():
... print(subtree)
... print(' Root = %s' % subtree.root())
(VP (VERB saw) (NP (DET the) (NOUN dog)))
Root = (VP (VERB saw) (NP (DET the) (NOUN dog)))
(VERB saw)
Root = (VP (VERB saw) (NP (DET the) (NOUN dog)))
(NP (DET the) (NOUN dog))
Root = (VP (VERB saw) (NP (DET the) (NOUN dog)))
(DET the)
Root = (VP (VERB saw) (NP (DET the) (NOUN dog)))
(NOUN dog)
Root = (VP (VERB saw) (NP (DET the) (NOUN dog)))
The `treeposition()` method can be used to find a tree's treeposition
relative to its root:
>>> for subtree in ptree.subtrees():
... print(subtree)
... print(' Tree Position = %s' % (subtree.treeposition(),))
... assert subtree.root()[subtree.treeposition()] is subtree
(VP (VERB saw) (NP (DET the) (NOUN dog)))
Tree Position = ()
(VERB saw)
Tree Position = (0,)
(NP (DET the) (NOUN dog))
Tree Position = (1,)
(DET the)
Tree Position = (1, 0)
(NOUN dog)
Tree Position = (1, 1)
Whenever a parented tree is modified, all of the methods described
above (`parent()`, `parent_index()`, `left_sibling()`, `right_sibling()`,
`root()`, and `treeposition()`) are automatically updated. For example,
if we replace ``ptree``\ 's subtree for the word "dog" with a new
subtree for "cat," the method values for both the "dog" subtree and the
"cat" subtree get automatically updated:
>>> # Replace the dog with a cat
>>> dog = ptree[1,1]
>>> cat = ParentedTree('NOUN', ['cat'])
>>> ptree[1,1] = cat
>>> # the noun phrase is no longer the dog's parent:
>>> print(dog.parent(), dog.parent_index(), dog.left_sibling())
None None None
>>> # dog is now its own root.
>>> print(dog.root())
(NOUN dog)
>>> print(dog.treeposition())
()
>>> # the cat's parent is now the noun phrase:
>>> print(cat.parent())
(NP (DET the) (NOUN cat))
>>> print(cat.parent_index())
1
>>> print(cat.left_sibling())
(DET the)
>>> print(cat.root())
(VP (VERB saw) (NP (DET the) (NOUN cat)))
>>> print(cat.treeposition())
(1, 1)
ParentedTree Regression Tests
-----------------------------
Keep track of all trees that we create (including subtrees) using this
variable:
>>> all_ptrees = []
Define a helper function to create new parented trees:
>>> def make_ptree(s):
... ptree = ParentedTree.convert(Tree.fromstring(s))
... all_ptrees.extend(t for t in ptree.subtrees()
... if isinstance(t, Tree))
... return ptree
Define a test function that examines every subtree in all_ptrees; and
checks that all six of its methods are defined correctly. If any
ptrees are passed as arguments, then they are printed.
>>> def pcheck(*print_ptrees):
... for ptree in all_ptrees:
... # Check ptree's methods.
... if ptree.parent() is not None:
... i = ptree.parent_index()
... assert ptree.parent()[i] is ptree
... if i > 0:
... assert ptree.left_sibling() is ptree.parent()[i-1]
... if i < (len(ptree.parent())-1):
... assert ptree.right_sibling() is ptree.parent()[i+1]
... assert len(ptree.treeposition()) > 0
... assert (ptree.treeposition() ==
... ptree.parent().treeposition() + (ptree.parent_index(),))
... assert ptree.root() is not ptree
... assert ptree.root() is not None
... assert ptree.root() is ptree.parent().root()
... assert ptree.root()[ptree.treeposition()] is ptree
... else:
... assert ptree.parent_index() is None
... assert ptree.left_sibling() is None
... assert ptree.right_sibling() is None
... assert ptree.root() is ptree
... assert ptree.treeposition() == ()
... # Check ptree's children's methods:
... for i, child in enumerate(ptree):
... if isinstance(child, Tree):
... # pcheck parent() & parent_index() methods
... assert child.parent() is ptree
... assert child.parent_index() == i
... # pcheck sibling methods
... if i == 0:
... assert child.left_sibling() is None
... else:
... assert child.left_sibling() is ptree[i-1]
... if i == len(ptree)-1:
... assert child.right_sibling() is None
... else:
... assert child.right_sibling() is ptree[i+1]
... if print_ptrees:
... print('ok!', end=' ')
... for ptree in print_ptrees: print(ptree)
... else:
... print('ok!')
Run our test function on a variety of newly-created trees:
>>> pcheck(make_ptree('(A)'))
ok! (A )
>>> pcheck(make_ptree('(A (B (C (D) (E f)) g) h)'))
ok! (A (B (C (D ) (E f)) g) h)
>>> pcheck(make_ptree('(A (B) (C c) (D d d) (E e e e))'))
ok! (A (B ) (C c) (D d d) (E e e e))
>>> pcheck(make_ptree('(A (B) (C (c)) (D (d) (d)) (E (e) (e) (e)))'))
ok! (A (B ) (C (c )) (D (d ) (d )) (E (e ) (e ) (e )))
Run our test function after performing various tree-modification
operations:
**__delitem__()**
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> e = ptree[0,0,1]
>>> del ptree[0,0,1]; pcheck(ptree); pcheck(e)
ok! (A (B (C (D ) (Q p)) g) h)
ok! (E f)
>>> del ptree[0,0,0]; pcheck(ptree)
ok! (A (B (C (Q p)) g) h)
>>> del ptree[0,1]; pcheck(ptree)
ok! (A (B (C (Q p))) h)
>>> del ptree[-1]; pcheck(ptree)
ok! (A (B (C (Q p))))
>>> del ptree[-100]
Traceback (most recent call last):
. . .
IndexError: index out of range
>>> del ptree[()]
Traceback (most recent call last):
. . .
IndexError: The tree position () may not be deleted.
>>> # With slices:
>>> ptree = make_ptree('(A (B c) (D e) f g (H i) j (K l))')
>>> b = ptree[0]
>>> del ptree[0:0]; pcheck(ptree)
ok! (A (B c) (D e) f g (H i) j (K l))
>>> del ptree[:1]; pcheck(ptree); pcheck(b)
ok! (A (D e) f g (H i) j (K l))
ok! (B c)
>>> del ptree[-2:]; pcheck(ptree)
ok! (A (D e) f g (H i))
>>> del ptree[1:3]; pcheck(ptree)
ok! (A (D e) (H i))
>>> ptree = make_ptree('(A (B c) (D e) f g (H i) j (K l))')
>>> del ptree[5:1000]; pcheck(ptree)
ok! (A (B c) (D e) f g (H i))
>>> del ptree[-2:1000]; pcheck(ptree)
ok! (A (B c) (D e) f)
>>> del ptree[-100:1]; pcheck(ptree)
ok! (A (D e) f)
>>> ptree = make_ptree('(A (B c) (D e) f g (H i) j (K l))')
>>> del ptree[1:-2:2]; pcheck(ptree)
ok! (A (B c) f (H i) j (K l))
**__setitem__()**
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> d, e, q = ptree[0,0]
>>> ptree[0,0,0] = 'x'; pcheck(ptree); pcheck(d)
ok! (A (B (C x (E f) (Q p)) g) h)
ok! (D )
>>> ptree[0,0,1] = make_ptree('(X (Y z))'); pcheck(ptree); pcheck(e)
ok! (A (B (C x (X (Y z)) (Q p)) g) h)
ok! (E f)
>>> ptree[1] = d; pcheck(ptree)
ok! (A (B (C x (X (Y z)) (Q p)) g) (D ))
>>> ptree[-1] = 'x'; pcheck(ptree)
ok! (A (B (C x (X (Y z)) (Q p)) g) x)
>>> ptree[-100] = 'y'
Traceback (most recent call last):
. . .
IndexError: index out of range
>>> ptree[()] = make_ptree('(X y)')
Traceback (most recent call last):
. . .
IndexError: The tree position () may not be assigned to.
>>> # With slices:
>>> ptree = make_ptree('(A (B c) (D e) f g (H i) j (K l))')
>>> b = ptree[0]
>>> ptree[0:0] = ('x', make_ptree('(Y)')); pcheck(ptree)
ok! (A x (Y ) (B c) (D e) f g (H i) j (K l))
>>> ptree[2:6] = (); pcheck(ptree); pcheck(b)
ok! (A x (Y ) (H i) j (K l))
ok! (B c)
>>> ptree[-2:] = ('z', 'p'); pcheck(ptree)
ok! (A x (Y ) (H i) z p)
>>> ptree[1:3] = [make_ptree('(X)') for x in range(10)]; pcheck(ptree)
ok! (A x (X ) (X ) (X ) (X ) (X ) (X ) (X ) (X ) (X ) (X ) z p)
>>> ptree[5:1000] = []; pcheck(ptree)
ok! (A x (X ) (X ) (X ) (X ))
>>> ptree[-2:1000] = ['n']; pcheck(ptree)
ok! (A x (X ) (X ) n)
>>> ptree[-100:1] = [make_ptree('(U v)')]; pcheck(ptree)
ok! (A (U v) (X ) (X ) n)
>>> ptree[-1:] = (make_ptree('(X)') for x in range(3)); pcheck(ptree)
ok! (A (U v) (X ) (X ) (X ) (X ) (X ))
>>> ptree[1:-2:2] = ['x', 'y']; pcheck(ptree)
ok! (A (U v) x (X ) y (X ) (X ))
**append()**
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> ptree.append('x'); pcheck(ptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x)
>>> ptree.append(make_ptree('(X (Y z))')); pcheck(ptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x (X (Y z)))
**extend()**
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> ptree.extend(['x', 'y', make_ptree('(X (Y z))')]); pcheck(ptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x y (X (Y z)))
>>> ptree.extend([]); pcheck(ptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x y (X (Y z)))
>>> ptree.extend(make_ptree('(X)') for x in range(3)); pcheck(ptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x y (X (Y z)) (X ) (X ) (X ))
**insert()**
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> ptree.insert(0, make_ptree('(X (Y z))')); pcheck(ptree)
ok! (A (X (Y z)) (B (C (D ) (E f) (Q p)) g) h)
>>> ptree.insert(-1, make_ptree('(X (Y z))')); pcheck(ptree)
ok! (A (X (Y z)) (B (C (D ) (E f) (Q p)) g) (X (Y z)) h)
>>> ptree.insert(-4, make_ptree('(X (Y z))')); pcheck(ptree)
ok! (A (X (Y z)) (X (Y z)) (B (C (D ) (E f) (Q p)) g) (X (Y z)) h)
>>> # Note: as with ``list``, inserting at a negative index that
>>> # gives a position before the start of the list does *not*
>>> # raise an IndexError exception; it just inserts at 0.
>>> ptree.insert(-400, make_ptree('(X (Y z))')); pcheck(ptree)
ok! (A
(X (Y z))
(X (Y z))
(X (Y z))
(B (C (D ) (E f) (Q p)) g)
(X (Y z))
h)
**pop()**
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> ptree[0,0].pop(1); pcheck(ptree)
ParentedTree('E', ['f'])
ok! (A (B (C (D ) (Q p)) g) h)
>>> ptree[0].pop(-1); pcheck(ptree)
'g'
ok! (A (B (C (D ) (Q p))) h)
>>> ptree.pop(); pcheck(ptree)
'h'
ok! (A (B (C (D ) (Q p))))
>>> ptree.pop(-100)
Traceback (most recent call last):
. . .
IndexError: index out of range
**remove()**
>>> ptree = make_ptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> e = ptree[0,0,1]
>>> ptree[0,0].remove(ptree[0,0,1]); pcheck(ptree); pcheck(e)
ok! (A (B (C (D ) (Q p)) g) h)
ok! (E f)
>>> ptree[0,0].remove(make_ptree('(Q p)')); pcheck(ptree)
ok! (A (B (C (D )) g) h)
>>> ptree[0,0].remove(make_ptree('(Q p)'))
Traceback (most recent call last):
. . .
ValueError: ParentedTree('Q', ['p']) is not in list
>>> ptree.remove('h'); pcheck(ptree)
ok! (A (B (C (D )) g))
>>> ptree.remove('h');
Traceback (most recent call last):
. . .
ValueError: 'h' is not in list
>>> # remove() removes the first subtree that is equal (==) to the
>>> # given tree, which may not be the identical tree we give it:
>>> ptree = make_ptree('(A (X x) (Y y) (X x))')
>>> x1, y, x2 = ptree
>>> ptree.remove(ptree[-1]); pcheck(ptree)
ok! (A (Y y) (X x))
>>> print(x1.parent()); pcheck(x1)
None
ok! (X x)
>>> print(x2.parent())
(A (Y y) (X x))
Test that a tree can not be given multiple parents:
>>> ptree = make_ptree('(A (X x) (Y y) (Z z))')
>>> ptree[0] = ptree[1]
Traceback (most recent call last):
. . .
ValueError: Can not insert a subtree that already has a parent.
>>> pcheck()
ok!
[more to be written]
Shallow copying can be tricky for Tree and several of its subclasses.
For shallow copies of Tree, only the root node is reconstructed, while
all the children are shared between the two trees. Modify the children
of one tree - and the shallowly copied tree will also update.
>>> from nltk.tree import Tree, ParentedTree, MultiParentedTree
>>> tree = Tree.fromstring("(TOP (S (NP (NNP Bell,)) (NP (NP (DT a) (NN company)) (SBAR (WHNP (WDT which)) (S (VP (VBZ is) (VP (VBN based) (PP (IN in) (NP (NNP LA,)))))))) (VP (VBZ makes) (CC and) (VBZ distributes) (NP (NN computer))) (. products.)))")
>>> copy_tree = tree.copy(deep=False)
>>> tree == copy_tree # Ensure identical labels and nodes
True
>>> id(copy_tree[0]) == id(tree[0]) # Ensure shallow copy - the children are the same objects in memory
True
For ParentedTree objects, this behaviour is not possible. With a shallow
copy, the children of the root node would be reused for both the original
and the shallow copy. For this to be possible, some children would need
to have multiple parents. As this is forbidden for ParentedTree objects,
attempting to make a shallow copy will cause a warning, and a deep copy
is made instead.
>>> ptree = ParentedTree.fromstring("(TOP (S (NP (NNP Bell,)) (NP (NP (DT a) (NN company)) (SBAR (WHNP (WDT which)) (S (VP (VBZ is) (VP (VBN based) (PP (IN in) (NP (NNP LA,)))))))) (VP (VBZ makes) (CC and) (VBZ distributes) (NP (NN computer))) (. products.)))")
>>> copy_ptree = ptree.copy(deep=False)
>>> copy_ptree == ptree # Ensure identical labels and nodes
True
>>> id(copy_ptree[0]) != id(ptree[0]) # Shallow copying isn't supported - it defaults to deep copy.
True
For MultiParentedTree objects, the issue of only allowing one parent that
can be seen for ParentedTree objects is no more. Shallow copying a
MultiParentedTree gives the children of the root node two parents:
the original and the newly copied root.
>>> mptree = MultiParentedTree.fromstring("(TOP (S (NP (NNP Bell,)) (NP (NP (DT a) (NN company)) (SBAR (WHNP (WDT which)) (S (VP (VBZ is) (VP (VBN based) (PP (IN in) (NP (NNP LA,)))))))) (VP (VBZ makes) (CC and) (VBZ distributes) (NP (NN computer))) (. products.)))")
>>> len(mptree[0].parents())
1
>>> copy_mptree = mptree.copy(deep=False)
>>> copy_mptree == mptree # Ensure identical labels and nodes
True
>>> len(mptree[0].parents())
2
>>> len(copy_mptree[0].parents())
2
Shallow copying a MultiParentedTree is similar to creating a second root
which is identically labeled as the root on which the copy method was called.
ImmutableParentedTree Regression Tests
--------------------------------------
>>> iptree = ImmutableParentedTree.convert(ptree)
>>> type(iptree)
<class 'nltk.tree.immutable.ImmutableParentedTree'>
>>> del iptree[0]
Traceback (most recent call last):
. . .
ValueError: ImmutableParentedTree may not be modified
>>> iptree.set_label('newnode')
Traceback (most recent call last):
. . .
ValueError: ImmutableParentedTree may not be modified
MultiParentedTree Regression Tests
----------------------------------
Keep track of all trees that we create (including subtrees) using this
variable:
>>> all_mptrees = []
Define a helper function to create new parented trees:
>>> def make_mptree(s):
... mptree = MultiParentedTree.convert(Tree.fromstring(s))
... all_mptrees.extend(t for t in mptree.subtrees()
... if isinstance(t, Tree))
... return mptree
Define a test function that examines every subtree in all_mptrees; and
checks that all six of its methods are defined correctly. If any
mptrees are passed as arguments, then they are printed.
>>> def mpcheck(*print_mptrees):
... def has(seq, val): # uses identity comparison
... for item in seq:
... if item is val: return True
... return False
... for mptree in all_mptrees:
... # Check mptree's methods.
... if len(mptree.parents()) == 0:
... assert len(mptree.left_siblings()) == 0
... assert len(mptree.right_siblings()) == 0
... assert len(mptree.roots()) == 1
... assert mptree.roots()[0] is mptree
... assert mptree.treepositions(mptree) == [()]
... left_siblings = right_siblings = ()
... roots = {id(mptree): 1}
... else:
... roots = dict((id(r), 0) for r in mptree.roots())
... left_siblings = mptree.left_siblings()
... right_siblings = mptree.right_siblings()
... for parent in mptree.parents():
... for i in mptree.parent_indices(parent):
... assert parent[i] is mptree
... # check left siblings
... if i > 0:
... for j in range(len(left_siblings)):
... if left_siblings[j] is parent[i-1]:
... del left_siblings[j]
... break
... else:
... assert 0, 'sibling not found!'
... # check ight siblings
... if i < (len(parent)-1):
... for j in range(len(right_siblings)):
... if right_siblings[j] is parent[i+1]:
... del right_siblings[j]
... break
... else:
... assert 0, 'sibling not found!'
... # check roots
... for root in parent.roots():
... assert id(root) in roots, 'missing root'
... roots[id(root)] += 1
... # check that we don't have any unexplained values
... assert len(left_siblings)==0, 'unexpected sibling'
... assert len(right_siblings)==0, 'unexpected sibling'
... for v in roots.values(): assert v>0, roots #'unexpected root'
... # check treepositions
... for root in mptree.roots():
... for treepos in mptree.treepositions(root):
... assert root[treepos] is mptree
... # Check mptree's children's methods:
... for i, child in enumerate(mptree):
... if isinstance(child, Tree):
... # mpcheck parent() & parent_index() methods
... assert has(child.parents(), mptree)
... assert i in child.parent_indices(mptree)
... # mpcheck sibling methods
... if i > 0:
... assert has(child.left_siblings(), mptree[i-1])
... if i < len(mptree)-1:
... assert has(child.right_siblings(), mptree[i+1])
... if print_mptrees:
... print('ok!', end=' ')
... for mptree in print_mptrees: print(mptree)
... else:
... print('ok!')
Run our test function on a variety of newly-created trees:
>>> mpcheck(make_mptree('(A)'))
ok! (A )
>>> mpcheck(make_mptree('(A (B (C (D) (E f)) g) h)'))
ok! (A (B (C (D ) (E f)) g) h)
>>> mpcheck(make_mptree('(A (B) (C c) (D d d) (E e e e))'))
ok! (A (B ) (C c) (D d d) (E e e e))
>>> mpcheck(make_mptree('(A (B) (C (c)) (D (d) (d)) (E (e) (e) (e)))'))
ok! (A (B ) (C (c )) (D (d ) (d )) (E (e ) (e ) (e )))
>>> subtree = make_mptree('(A (B (C (D) (E f)) g) h)')
Including some trees that contain multiple parents:
>>> mpcheck(MultiParentedTree('Z', [subtree, subtree]))
ok! (Z (A (B (C (D ) (E f)) g) h) (A (B (C (D ) (E f)) g) h))
Run our test function after performing various tree-modification
operations (n.b., these are the same tests that we ran for
`ParentedTree`, above; thus, none of these trees actually *uses*
multiple parents.)
**__delitem__()**
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> e = mptree[0,0,1]
>>> del mptree[0,0,1]; mpcheck(mptree); mpcheck(e)
ok! (A (B (C (D ) (Q p)) g) h)
ok! (E f)
>>> del mptree[0,0,0]; mpcheck(mptree)
ok! (A (B (C (Q p)) g) h)
>>> del mptree[0,1]; mpcheck(mptree)
ok! (A (B (C (Q p))) h)
>>> del mptree[-1]; mpcheck(mptree)
ok! (A (B (C (Q p))))
>>> del mptree[-100]
Traceback (most recent call last):
. . .
IndexError: index out of range
>>> del mptree[()]
Traceback (most recent call last):
. . .
IndexError: The tree position () may not be deleted.
>>> # With slices:
>>> mptree = make_mptree('(A (B c) (D e) f g (H i) j (K l))')
>>> b = mptree[0]
>>> del mptree[0:0]; mpcheck(mptree)
ok! (A (B c) (D e) f g (H i) j (K l))
>>> del mptree[:1]; mpcheck(mptree); mpcheck(b)
ok! (A (D e) f g (H i) j (K l))
ok! (B c)
>>> del mptree[-2:]; mpcheck(mptree)
ok! (A (D e) f g (H i))
>>> del mptree[1:3]; mpcheck(mptree)
ok! (A (D e) (H i))
>>> mptree = make_mptree('(A (B c) (D e) f g (H i) j (K l))')
>>> del mptree[5:1000]; mpcheck(mptree)
ok! (A (B c) (D e) f g (H i))
>>> del mptree[-2:1000]; mpcheck(mptree)
ok! (A (B c) (D e) f)
>>> del mptree[-100:1]; mpcheck(mptree)
ok! (A (D e) f)
>>> mptree = make_mptree('(A (B c) (D e) f g (H i) j (K l))')
>>> del mptree[1:-2:2]; mpcheck(mptree)
ok! (A (B c) f (H i) j (K l))
**__setitem__()**
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> d, e, q = mptree[0,0]
>>> mptree[0,0,0] = 'x'; mpcheck(mptree); mpcheck(d)
ok! (A (B (C x (E f) (Q p)) g) h)
ok! (D )
>>> mptree[0,0,1] = make_mptree('(X (Y z))'); mpcheck(mptree); mpcheck(e)
ok! (A (B (C x (X (Y z)) (Q p)) g) h)
ok! (E f)
>>> mptree[1] = d; mpcheck(mptree)
ok! (A (B (C x (X (Y z)) (Q p)) g) (D ))
>>> mptree[-1] = 'x'; mpcheck(mptree)
ok! (A (B (C x (X (Y z)) (Q p)) g) x)
>>> mptree[-100] = 'y'
Traceback (most recent call last):
. . .
IndexError: index out of range
>>> mptree[()] = make_mptree('(X y)')
Traceback (most recent call last):
. . .
IndexError: The tree position () may not be assigned to.
>>> # With slices:
>>> mptree = make_mptree('(A (B c) (D e) f g (H i) j (K l))')
>>> b = mptree[0]
>>> mptree[0:0] = ('x', make_mptree('(Y)')); mpcheck(mptree)
ok! (A x (Y ) (B c) (D e) f g (H i) j (K l))
>>> mptree[2:6] = (); mpcheck(mptree); mpcheck(b)
ok! (A x (Y ) (H i) j (K l))
ok! (B c)
>>> mptree[-2:] = ('z', 'p'); mpcheck(mptree)
ok! (A x (Y ) (H i) z p)
>>> mptree[1:3] = [make_mptree('(X)') for x in range(10)]; mpcheck(mptree)
ok! (A x (X ) (X ) (X ) (X ) (X ) (X ) (X ) (X ) (X ) (X ) z p)
>>> mptree[5:1000] = []; mpcheck(mptree)
ok! (A x (X ) (X ) (X ) (X ))
>>> mptree[-2:1000] = ['n']; mpcheck(mptree)
ok! (A x (X ) (X ) n)
>>> mptree[-100:1] = [make_mptree('(U v)')]; mpcheck(mptree)
ok! (A (U v) (X ) (X ) n)
>>> mptree[-1:] = (make_mptree('(X)') for x in range(3)); mpcheck(mptree)
ok! (A (U v) (X ) (X ) (X ) (X ) (X ))
>>> mptree[1:-2:2] = ['x', 'y']; mpcheck(mptree)
ok! (A (U v) x (X ) y (X ) (X ))
**append()**
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> mptree.append('x'); mpcheck(mptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x)
>>> mptree.append(make_mptree('(X (Y z))')); mpcheck(mptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x (X (Y z)))
**extend()**
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> mptree.extend(['x', 'y', make_mptree('(X (Y z))')]); mpcheck(mptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x y (X (Y z)))
>>> mptree.extend([]); mpcheck(mptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x y (X (Y z)))
>>> mptree.extend(make_mptree('(X)') for x in range(3)); mpcheck(mptree)
ok! (A (B (C (D ) (E f) (Q p)) g) h x y (X (Y z)) (X ) (X ) (X ))
**insert()**
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> mptree.insert(0, make_mptree('(X (Y z))')); mpcheck(mptree)
ok! (A (X (Y z)) (B (C (D ) (E f) (Q p)) g) h)
>>> mptree.insert(-1, make_mptree('(X (Y z))')); mpcheck(mptree)
ok! (A (X (Y z)) (B (C (D ) (E f) (Q p)) g) (X (Y z)) h)
>>> mptree.insert(-4, make_mptree('(X (Y z))')); mpcheck(mptree)
ok! (A (X (Y z)) (X (Y z)) (B (C (D ) (E f) (Q p)) g) (X (Y z)) h)
>>> # Note: as with ``list``, inserting at a negative index that
>>> # gives a position before the start of the list does *not*
>>> # raise an IndexError exception; it just inserts at 0.
>>> mptree.insert(-400, make_mptree('(X (Y z))')); mpcheck(mptree)
ok! (A
(X (Y z))
(X (Y z))
(X (Y z))
(B (C (D ) (E f) (Q p)) g)
(X (Y z))
h)
**pop()**
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> mptree[0,0].pop(1); mpcheck(mptree)
MultiParentedTree('E', ['f'])
ok! (A (B (C (D ) (Q p)) g) h)
>>> mptree[0].pop(-1); mpcheck(mptree)
'g'
ok! (A (B (C (D ) (Q p))) h)
>>> mptree.pop(); mpcheck(mptree)
'h'
ok! (A (B (C (D ) (Q p))))
>>> mptree.pop(-100)
Traceback (most recent call last):
. . .
IndexError: index out of range
**remove()**
>>> mptree = make_mptree('(A (B (C (D) (E f) (Q p)) g) h)')
>>> e = mptree[0,0,1]
>>> mptree[0,0].remove(mptree[0,0,1]); mpcheck(mptree); mpcheck(e)
ok! (A (B (C (D ) (Q p)) g) h)
ok! (E f)
>>> mptree[0,0].remove(make_mptree('(Q p)')); mpcheck(mptree)
ok! (A (B (C (D )) g) h)
>>> mptree[0,0].remove(make_mptree('(Q p)'))
Traceback (most recent call last):
. . .
ValueError: MultiParentedTree('Q', ['p']) is not in list
>>> mptree.remove('h'); mpcheck(mptree)
ok! (A (B (C (D )) g))
>>> mptree.remove('h');
Traceback (most recent call last):
. . .
ValueError: 'h' is not in list
>>> # remove() removes the first subtree that is equal (==) to the
>>> # given tree, which may not be the identical tree we give it:
>>> mptree = make_mptree('(A (X x) (Y y) (X x))')
>>> x1, y, x2 = mptree
>>> mptree.remove(mptree[-1]); mpcheck(mptree)
ok! (A (Y y) (X x))
>>> print([str(p) for p in x1.parents()])
[]
>>> print([str(p) for p in x2.parents()])
['(A (Y y) (X x))']
ImmutableMultiParentedTree Regression Tests
-------------------------------------------
>>> imptree = ImmutableMultiParentedTree.convert(mptree)
>>> type(imptree)
<class 'nltk.tree.immutable.ImmutableMultiParentedTree'>
>>> del imptree[0]
Traceback (most recent call last):
. . .
ValueError: ImmutableMultiParentedTree may not be modified
>>> imptree.set_label('newnode')
Traceback (most recent call last):
. . .
ValueError: ImmutableMultiParentedTree may not be modified
ProbabilisticTree Regression Tests
----------------------------------
>>> prtree = ProbabilisticTree("S", [ProbabilisticTree("NP", ["N"], prob=0.3)], prob=0.6)
>>> print(prtree)
(S (NP N)) (p=0.6)
>>> import copy
>>> prtree == copy.deepcopy(prtree) == prtree.copy(deep=True) == prtree.copy()
True
>>> prtree[0] is prtree.copy()[0]
True
>>> prtree[0] is prtree.copy(deep=True)[0]
False
>>> imprtree = ImmutableProbabilisticTree.convert(prtree)
>>> type(imprtree)
<class 'nltk.tree.immutable.ImmutableProbabilisticTree'>
>>> del imprtree[0]
Traceback (most recent call last):
. . .
ValueError: ImmutableProbabilisticTree may not be modified
>>> imprtree.set_label('newnode')
Traceback (most recent call last):
. . .
ValueError: ImmutableProbabilisticTree may not be modified
Squashed Bugs
=============
This used to discard the ``(B b)`` subtree (fixed in svn 6270):
>>> print(Tree.fromstring('((A a) (B b))'))
( (A a) (B b))
Pickling ParentedTree instances didn't work for Python 3.7 onwards (See #2478)
>>> import pickle
>>> tree = ParentedTree.fromstring('(S (NN x) (NP x) (NN x))')
>>> print(tree)
(S (NN x) (NP x) (NN x))
>>> pickled = pickle.dumps(tree)
>>> tree_loaded = pickle.loads(pickled)
>>> print(tree_loaded)
(S (NN x) (NP x) (NN x))
ParentedTree used to be impossible to (deep)copy. (See #1324)
>>> from nltk.tree import ParentedTree
>>> import copy
>>> tree = ParentedTree.fromstring("(TOP (S (NP (NNP Bell,)) (NP (NP (DT a) (NN company)) (SBAR (WHNP (WDT which)) (S (VP (VBZ is) (VP (VBN based) (PP (IN in) (NP (NNP LA,)))))))) (VP (VBZ makes) (CC and) (VBZ distributes) (NP (NN computer))) (. products.)))")
>>> tree == copy.deepcopy(tree) == copy.copy(tree) == tree.copy(deep=True) == tree.copy()
True