pdftools_toolbox.pdf.structure.node

Classes

Node(tag, document, page)

This class represents a structure element node in the structure element tree of a tagged PDF.

class pdftools_toolbox.pdf.structure.node.Node(tag: str, document: Document, page: Page | None)[source]

Bases: _NativeObject

This class represents a structure element node in the structure element tree of a tagged PDF. Nodes may either have a collection of other nodes as children, or be associated with marked content. These two roles cannot be mixed.

__init__(tag: str, document: Document, page: Page | None)[source]
Parameters:
  • tag (str) – Tags should conform to the Standard Structure Types described within the PDF standard or refer to entries in the RoleMap. Allowed values from the PDF standard are: Document, Part, Sect, Art, Div, H1, H2, H3, H4, H5, H6, P, L, LI, Lbl, LBody, Table, TR, TH, TD, THead, TBody, TFoot, Span, Quote, Note, Reference, Figure, Caption, Artifact, Form, Field, Link, Code, Annot, Ruby, Warichu, TOC, TOCI, Index and BibEntry.

  • document (pdftools_toolbox.pdf.document.Document) – The document containing the structure element tree.

  • page (Optional[pdftools_toolbox.pdf.page.Page]) – The page on which marked content associated with the structure element node is to be found. This is optional, but is best omitted for nodes which are not associated with marked content.

Raises:

StateError – if the object or the owning document has already been closed

property parent: Node

The parent node in the structure element tree.

Returns:

pdftools_toolbox.pdf.structure.node.Node

Raises:
  • StateError – if the object or the owning document has already been closed

  • OperationError – if the parent is the structure element tree root node

property children: NodeList

The list of child nodes under this node in the structure element tree. Once child nodes have been added to a node, it can no longer be associated with marked content.

Returns:

pdftools_toolbox.pdf.structure.node_list.NodeList

Raises:
property tag: str

Tags should conform to the Standard Structure Types described within the PDF standard.

Returns:

str

Raises:
property page: Page | None

The page on which marked content associated with the structure element node is to be found. This is optional, but is best omitted for nodes which are not associated with marked content.

Returns:

Optional[pdftools_toolbox.pdf.page.Page]

Raises:

StateError – if the object or the owning document has already been closed

property alternate_text: str | None

Alternate text to be used where the content denoted by the structure element and its children cannot be rendered because of accessibility or other concerns.

Returns:

Optional[str]

Raises:

StateError – if the object or the owning document has already been closed

property bounding_box: Rectangle | None

Bounding box for contents - should only be set for Figure, Formula and Table

Returns:

Optional[pdftools_toolbox.geometry.real.rectangle.Rectangle]

Raises:

StateError – if the object has already been closed