pdftools_toolbox.pdf.content.image
Classes
|
- class pdftools_toolbox.pdf.content.image.Image(handle)[source]
Bases:
_NativeObject
- static create(target_document: Document, stream: IOBase) Image [source]
Create an image object from image data.
Supported formats are:
BMP
DIB
JPEG
JPEG2000
JBIG2
PNG
GIF
The returned image object is not yet painted on any page, but it is associated with the given target document.
- Parameters:
targetDocument (pdftools_toolbox.pdf.document.Document) – the output document with which the returned object is associated
stream (io.IOBase) – the image data stream
- Returns:
the newly created image object
- Return type:
- Raises:
OSError – Error reading from the image or writing to the document
pdftools_toolbox.unknown_format_error.UnknownFormatError – The image data has an unknown format
pdftools_toolbox.corrupt_error.CorruptError – The image data is corrupt
ValueError – if the targetDocument argument has already been closed
ValueError – if the targetDocument argument is read-only
ValueError – if the stream argument is None
- extract(stream: IOBase, image_type: ImageType | None = None) None [source]
Extract embedded image from PDF
Facilitate the extraction of images from a specified page within a PDF, outputting them in the imageType format.
By default, the method determines the format of the extracted image based on the embedded image data present within the PDF. Users can ascertain the default image format through
pdftools_toolbox.pdf.content.image.Image.default_image_type
. It’s important to note that not all image types or conversion processes are universally supported, hence adhering to the defaultpdftools_toolbox.pdf.content.image_type.ImageType
is advisable for optimal compatibility.Key considerations include:
The extraction process isolates the image from the page’s resources, neglecting any contextual attributes from the PDF page. Consequently, the original resolution and modifications—such as scaling, rotation, or cropping—that influence the image’s appearance on the page are not preserved in the extracted image.
In instances where a
pdftools_toolbox.generic_error.GenericError
error arises, the output file may be compromised and rendered unusable.
This method is designed to efficiently retrieve images without their page-specific modifications, ensuring a straightforward extraction process.
- Parameters:
stream (io.IOBase) – The image data stream.
imageType (Optional[pdftools_toolbox.pdf.content.image_type.ImageType]) – The desired image type of the extracted image stream. If the embedded image data cannot be directly extracted to the chosen ImageType, the data is first recompressed and then extracted to the chosen ImageType. In this case, extraction is slower and there can be some loss of image quality. The default image type can be retrieved by calling
pdftools_toolbox.pdf.content.image.Image.default_image_type
.
- Raises:
ValueError – if the stream argument is null
StateError – if the image has already been closed
pdftools_toolbox.generic_error.GenericError – if image extraction fails
- redact(rect: Rectangle) None [source]
Redact rectangular part of the image
Redacts a part of the image specified by a rectangle, by changing the content of the image. This is not an annotation, the image data is changed and there will be no way to get the original data from the image itself. The content is changed by setting all pixels to the same color. This color, in general, is black, but that depends on the color space of the image.
- Parameters:
rect (pdftools_toolbox.geometry.real.rectangle.Rectangle) – Defines rectangular part of the image which is to be redacted. If the rectangle is not completely within the image boundaries, only the part that is within the boundaries will be redacted.
- Raises:
ValueError – if the rect argument is invalid
- property default_image_type: ImageType
Default extracted image type.
The default image type that will be extracted, based on the way that the image data is compressed and stored in the PDF file. The type of the output image is
pdftools_toolbox.pdf.content.image_type.ImageType.JPEG
for embedded JPEG and JPEG2000 images. In all other cases the image type will bepdftools_toolbox.pdf.content.image_type.ImageType.TIFF
.- Returns:
pdftools_toolbox.pdf.content.image_type.ImageType
- Raises:
StateError – if the image has already been closed
- property size: Size
The size of the image in samples.
Samples are often also called pixels.
- Returns:
pdftools_toolbox.geometry.integer.size.Size
- Raises:
StateError – if the image has already been closed
- property samples: List[int]
The raw content of the image.
The samples (pixels) are given in order, top to bottom, left to right. Each sample is given component by component. There is no padding between components or samples, except that each row of sample data begins on a byte boundary. If the number of data bits per row is not a multiple of 8, the end of the row is padded with extra bits to fill out the last byte. Padding bits should be ignored.
Most often, each component is 8 bits, so there’s no packing/unpacking or alignment/padding. Components with 2 or 4 bits are very rare.
If the image is compressed, it will be decompressed in order to get the samples. For very large images, this may take some time.
When setting samples, the original compression type of the image does not change. Compression from the raw samples typically takes significantly longer than decompression. Therefore, setting for large images might be perceived as slow. None of the image parameters can be changed, so when setting samples, the size of the array must match that of the original image.
- Returns:
List[int]
- Raises:
StateError – if the image has already been closed
- property bits_per_component: int
the number of bits per component.
The number of bits used to represent each color component. Only a single value may be specified; the number of bits is the same for all color components. Valid values are 1, 2, 4, and 8.
- Returns:
int
- Raises:
StateError – if the image has already been closed
- property color_space: ColorSpace
the color space in which image samples are specified.
- Returns:
pdftools_toolbox.pdf.content.color_space.ColorSpace
- Raises:
StateError – if the image has already been closed