Willie Alberty
2008-04-05 02:53:59 UTC
For the last few months I have been working with Kevin McArthur on a
comprehensive PDF generation project for a client [Streamflow] who has
some pretty advanced layout needs. The project is nearing completion
and we have been discussing the possibility of contributing large
portions of the code back to the Zend Framework as improvements to
Zend_Pdf.
In light of several recent postings to fw-general and fw-formats, as
well as a few encouraging proposals recently submitted to the Wiki, we
would like to formally announce our plans and describe the new
functionality at a high level here.
We will be submitting proposals in the coming weeks that describe
these new components in more detail along with fully-functional
reference implementations. Our hope is to join forces with other
interested developers to help fast-track these proposals through the
feedback and approval process, write tests, user documentation, and
examples, and exercise the code as much as possible.
We're really proud of this work and are excited to share it with the
community. We believe that these enhancements will further establish
Zend_Pdf's role as the gold standard for PDF generation using PHP.
Text Layout Engine
------------------
"How do I wrap long lines of text?" This is probably the most commonly-
asked question regarding Zend_Pdf. I'm pleased to report that not only
have we solved the problem of text-wrapping, but a whole host of
others as well. The new engine provides fully-automatic text layout,
and has customization hooks in a variety of places.
Line breaks are calculated using the Unicode Line Breaking Algorithm
(UAX #14), providing linguistically-appropriate line breaks, not just
at whitespace characters.
Paragraph styles allow you to specify left-, center-, and right-
alignment, as well as full justification, line leading, line height,
line multiple (double-space, triple-space, etc.), pre- and post-
paragraph spacing, left- and right-side margins, and first-line
indentation. Paragraph styles also support left-, center-, right-, and
decimal-aligned tab stops, with or without leaders, for intra-line
alignment needs.
In addition to the left-to-right line sweep used by most Latin-based
scripts, right-to-left line sweep is also supported, and is
automatically detected by the layout engine; you never need to supply
strings in reverse character order for right-to-left text layout.
The layout engine is based around the concept of an attributed string.
These are Unicode strings of unlimited length, and fully support the
entire Unicode character set, including characters outside the Basic
Multilingual Plane (BMP).
Attributed strings allow you to assign stylistic attributes to
arbitrary ranges of characters within the string. These attributes are
used by typesetters to determine the specific look and location for
every character. This means that you can make unlimited style changes
within a block of text, even changing styles character-by-character if
desired.
The layout engine automatically manages all of these style changes,
applying them as necessary when drawing the text on the page. The
following style attributes are supported:
- Font
- Font size
- Fill color
- Stroke width and color
- Underline and strikethrough
- Super- and sub-script
- Background color
You can add your own custom attributes as well, which you can use in
your own subclasses to completely customize the layout engine's
behavior.
These attributed strings will eventually be shared with Zend_Rtf
(recently proposed by Andries Seutens), as each attributed string is
essentially a self-contained RTF document. This opens up the
possibility for generating fully-styled PDF or RTF output from the
same source with only a couple of lines of code. It will also
eventually be possible to use existing styled RTF documents as the
basis for PDF text drawing, eliminating the need to manually apply
style attributes in your PHP code.
A layout manager class is responsible for drawing these attributed
strings. It lays out the text in a series of arbitrarily-shaped text
containers, automatically moving from one to the next as each is
filled. Rectangular and circular containers will be provided, but you
can easily create your own custom containers for other shapes or to
flow text around images.
Multi-column output is as easy as creating two adjacent text
containers on the same page. Text containers don't even need to be on
the same PDF page: you can start your text in a small container on
page 1, then continue it on page 17.
Callback functions are provided to allow you to create text additional
containers as needed, which can be located on new pages. This is
useful if you do not know the length of the text you are drawing ahead
of time, or if you want to adapt your layout on-the-fly.
You can also use multiple layout managers on a single page, allowing
you to create complex multi-page flows for a series of text runs.
These can be useful for creating page headers and footers, or for
running stories side-by-side in a newsletter.
Drawing Model
-------------
Three new primitive geometry classes allow you to precisely define
drawing locations, sizes, and regions. They also provide a host of
convenience functions allowing for calculation, conversion,
intersection testing, etc.:
- Point: x and y coordinate
- Size: height and width
- Rectangle: combination of a point and size
PDF pages are drawn using a series of content streams, which contain
all of the low-level drawing commands. Zend_Pdf_Page currently manages
its own private content stream.
We've separated content streams from Zend_Pdf_Page, promoting them to
first-class objects. This allows us to use these content streams as
templates that can be reused again and again, either on a single page
or multiple pages. Templates can greatly reduce PDF file sizes and
improve memory use and performance in PDF viewer applications.
It is also possible to create a template from any page in an existing
PDF document. You can then reuse the template in the same PDF, or even
copy it to a new PDF document, where you can use it as a page
background, draw it as a thumbnail, perform imposition, etc.
Performance and Memory
----------------------
We've also made numerous performance and memory-usage improvements
throughout the code. Most data is now lazily-loaded, allowing you to
manipulate very large documents, containing thousands or millions of
individual objects or hundreds of megabytes or gigabytes in size, with
a very low memory footprint.
Future Enhancements
-------------------
All of this new functionality lays the groundwork for even more
powerful enhancements down the road:
- Top-to-bottom line sweep for Asian scripts
- Bi-directional text (for Hebrew, Arabic, and others)
- Bulleted and numbered text lists
- HTML-inspired inline text tables
- Inline attachments (for example, images that flow with text)
- Advanced typographic features such as tracking, pairwise kerning,
ligatures, etc.
- Hyphenation support
- Glyph substitution using fallback fonts
- and more...
Again, we're really excited to be sharing this code with the
community. We'll be creating the proposals for the various components
in the coming weeks and announcing them on the fw-formats list when
they're ready for review. In the meantime, if you have any high-level
questions, please don't hesitate to ask.
--
Willie Alberty, Owner
Spenlen Media
willie-***@public.gmane.org
http://www.spenlen.com/
comprehensive PDF generation project for a client [Streamflow] who has
some pretty advanced layout needs. The project is nearing completion
and we have been discussing the possibility of contributing large
portions of the code back to the Zend Framework as improvements to
Zend_Pdf.
In light of several recent postings to fw-general and fw-formats, as
well as a few encouraging proposals recently submitted to the Wiki, we
would like to formally announce our plans and describe the new
functionality at a high level here.
We will be submitting proposals in the coming weeks that describe
these new components in more detail along with fully-functional
reference implementations. Our hope is to join forces with other
interested developers to help fast-track these proposals through the
feedback and approval process, write tests, user documentation, and
examples, and exercise the code as much as possible.
We're really proud of this work and are excited to share it with the
community. We believe that these enhancements will further establish
Zend_Pdf's role as the gold standard for PDF generation using PHP.
Text Layout Engine
------------------
"How do I wrap long lines of text?" This is probably the most commonly-
asked question regarding Zend_Pdf. I'm pleased to report that not only
have we solved the problem of text-wrapping, but a whole host of
others as well. The new engine provides fully-automatic text layout,
and has customization hooks in a variety of places.
Line breaks are calculated using the Unicode Line Breaking Algorithm
(UAX #14), providing linguistically-appropriate line breaks, not just
at whitespace characters.
Paragraph styles allow you to specify left-, center-, and right-
alignment, as well as full justification, line leading, line height,
line multiple (double-space, triple-space, etc.), pre- and post-
paragraph spacing, left- and right-side margins, and first-line
indentation. Paragraph styles also support left-, center-, right-, and
decimal-aligned tab stops, with or without leaders, for intra-line
alignment needs.
In addition to the left-to-right line sweep used by most Latin-based
scripts, right-to-left line sweep is also supported, and is
automatically detected by the layout engine; you never need to supply
strings in reverse character order for right-to-left text layout.
The layout engine is based around the concept of an attributed string.
These are Unicode strings of unlimited length, and fully support the
entire Unicode character set, including characters outside the Basic
Multilingual Plane (BMP).
Attributed strings allow you to assign stylistic attributes to
arbitrary ranges of characters within the string. These attributes are
used by typesetters to determine the specific look and location for
every character. This means that you can make unlimited style changes
within a block of text, even changing styles character-by-character if
desired.
The layout engine automatically manages all of these style changes,
applying them as necessary when drawing the text on the page. The
following style attributes are supported:
- Font
- Font size
- Fill color
- Stroke width and color
- Underline and strikethrough
- Super- and sub-script
- Background color
You can add your own custom attributes as well, which you can use in
your own subclasses to completely customize the layout engine's
behavior.
These attributed strings will eventually be shared with Zend_Rtf
(recently proposed by Andries Seutens), as each attributed string is
essentially a self-contained RTF document. This opens up the
possibility for generating fully-styled PDF or RTF output from the
same source with only a couple of lines of code. It will also
eventually be possible to use existing styled RTF documents as the
basis for PDF text drawing, eliminating the need to manually apply
style attributes in your PHP code.
A layout manager class is responsible for drawing these attributed
strings. It lays out the text in a series of arbitrarily-shaped text
containers, automatically moving from one to the next as each is
filled. Rectangular and circular containers will be provided, but you
can easily create your own custom containers for other shapes or to
flow text around images.
Multi-column output is as easy as creating two adjacent text
containers on the same page. Text containers don't even need to be on
the same PDF page: you can start your text in a small container on
page 1, then continue it on page 17.
Callback functions are provided to allow you to create text additional
containers as needed, which can be located on new pages. This is
useful if you do not know the length of the text you are drawing ahead
of time, or if you want to adapt your layout on-the-fly.
You can also use multiple layout managers on a single page, allowing
you to create complex multi-page flows for a series of text runs.
These can be useful for creating page headers and footers, or for
running stories side-by-side in a newsletter.
Drawing Model
-------------
Three new primitive geometry classes allow you to precisely define
drawing locations, sizes, and regions. They also provide a host of
convenience functions allowing for calculation, conversion,
intersection testing, etc.:
- Point: x and y coordinate
- Size: height and width
- Rectangle: combination of a point and size
PDF pages are drawn using a series of content streams, which contain
all of the low-level drawing commands. Zend_Pdf_Page currently manages
its own private content stream.
We've separated content streams from Zend_Pdf_Page, promoting them to
first-class objects. This allows us to use these content streams as
templates that can be reused again and again, either on a single page
or multiple pages. Templates can greatly reduce PDF file sizes and
improve memory use and performance in PDF viewer applications.
It is also possible to create a template from any page in an existing
PDF document. You can then reuse the template in the same PDF, or even
copy it to a new PDF document, where you can use it as a page
background, draw it as a thumbnail, perform imposition, etc.
Performance and Memory
----------------------
We've also made numerous performance and memory-usage improvements
throughout the code. Most data is now lazily-loaded, allowing you to
manipulate very large documents, containing thousands or millions of
individual objects or hundreds of megabytes or gigabytes in size, with
a very low memory footprint.
Future Enhancements
-------------------
All of this new functionality lays the groundwork for even more
powerful enhancements down the road:
- Top-to-bottom line sweep for Asian scripts
- Bi-directional text (for Hebrew, Arabic, and others)
- Bulleted and numbered text lists
- HTML-inspired inline text tables
- Inline attachments (for example, images that flow with text)
- Advanced typographic features such as tracking, pairwise kerning,
ligatures, etc.
- Hyphenation support
- Glyph substitution using fallback fonts
- and more...
Again, we're really excited to be sharing this code with the
community. We'll be creating the proposals for the various components
in the coming weeks and announcing them on the fw-formats list when
they're ready for review. In the meantime, if you have any high-level
questions, please don't hesitate to ask.
--
Willie Alberty, Owner
Spenlen Media
willie-***@public.gmane.org
http://www.spenlen.com/