pdftools_toolbox.pdf.content.image

Classes

Image(handle)

class pdftools_toolbox.pdf.content.image.Image(handle)[source]

Bases: _NativeObject

static create(target_document: Document, stream: IOBase) Image[source]

Create an image object from image data.

Supported formats are:

  • BMP

  • DIB

  • JPEG

  • JPEG2000

  • JBIG2

  • PNG

  • GIF

The returned image object is not yet painted on any page, but it is associated with the given target document.

Parameters:
Returns:

the newly created image object

Return type:

pdftools_toolbox.pdf.content.image.Image

Raises:
extract(stream: IOBase, image_type: ImageType | None = None) None[source]

Extract embedded image from PDF

Facilitate the extraction of images from a specified page within a PDF, outputting them in the imageType format.

By default, the method determines the format of the extracted image based on the embedded image data present within the PDF. Users can ascertain the default image format through pdftools_toolbox.pdf.content.image.Image.default_image_type . It’s important to note that not all image types or conversion processes are universally supported, hence adhering to the default pdftools_toolbox.pdf.content.image_type.ImageType is advisable for optimal compatibility.

Key considerations include:

  • The extraction process isolates the image from the page’s resources, neglecting any contextual attributes from the PDF page. Consequently, the original resolution and modifications—such as scaling, rotation, or cropping—that influence the image’s appearance on the page are not preserved in the extracted image.

  • In instances where a pdftools_toolbox.generic_error.GenericError error arises, the output file may be compromised and rendered unusable.

This method is designed to efficiently retrieve images without their page-specific modifications, ensuring a straightforward extraction process.

Parameters:
Raises:
redact(rect: Rectangle) None[source]

Redact rectangular part of the image

Redacts a part of the image specified by a rectangle, by changing the content of the image. This is not an annotation, the image data is changed and there will be no way to get the original data from the image itself. The content is changed by setting all pixels to the same color. This color, in general, is black, but that depends on the color space of the image.

Parameters:

rect (pdftools_toolbox.geometry.real.rectangle.Rectangle) – Defines rectangular part of the image which is to be redacted. If the rectangle is not completely within the image boundaries, only the part that is within the boundaries will be redacted.

Raises:

ValueError – if the rect argument is invalid

property default_image_type: ImageType

Default extracted image type.

The default image type that will be extracted, based on the way that the image data is compressed and stored in the PDF file. The type of the output image is pdftools_toolbox.pdf.content.image_type.ImageType.JPEG for embedded JPEG and JPEG2000 images. In all other cases the image type will be pdftools_toolbox.pdf.content.image_type.ImageType.TIFF .

Returns:

pdftools_toolbox.pdf.content.image_type.ImageType

Raises:

StateError – if the image has already been closed

property size: Size

The size of the image in samples.

Samples are often also called pixels.

Returns:

pdftools_toolbox.geometry.integer.size.Size

Raises:

StateError – if the image has already been closed

property samples: List[int]

The raw content of the image.

The samples (pixels) are given in order, top to bottom, left to right. Each sample is given component by component. There is no padding between components or samples, except that each row of sample data begins on a byte boundary. If the number of data bits per row is not a multiple of 8, the end of the row is padded with extra bits to fill out the last byte. Padding bits should be ignored.

Most often, each component is 8 bits, so there’s no packing/unpacking or alignment/padding. Components with 2 or 4 bits are very rare.

If the image is compressed, it will be decompressed in order to get the samples. For very large images, this may take some time.

When setting samples, the original compression type of the image does not change. Compression from the raw samples typically takes significantly longer than decompression. Therefore, setting for large images might be perceived as slow. None of the image parameters can be changed, so when setting samples, the size of the array must match that of the original image.

Returns:

List[int]

Raises:

StateError – if the image has already been closed

property bits_per_component: int

the number of bits per component.

The number of bits used to represent each color component. Only a single value may be specified; the number of bits is the same for all color components. Valid values are 1, 2, 4, and 8.

Returns:

int

Raises:

StateError – if the image has already been closed

property color_space: ColorSpace

the color space in which image samples are specified.

Returns:

pdftools_toolbox.pdf.content.color_space.ColorSpace

Raises:

StateError – if the image has already been closed