By Karl-Heinz Herrmann
Why would I want to use a command line tool to generate a WYSIWYG/GUI Document?
Well, I don't know about you, but I am not very keen on extremely boring and repetitive clicking through some GUI if the task at hand doesn't need individual care for every page done. Specifically, I had a presentation created with LaTeX/beamer in PDF format and "boss" wanted to have a PowerPoint file. Or you have your paper accepted at this great/important conference and you find out they will only accept PPT files for a talk: no PDF, no Impress, no own laptop.
So, I found myself importing a series of identically-sized images into a PowerPoint presentation once too often (i.e. once) and got extremely bored. So bored that I started investigating if I could do it any other way, even if it took me a week to write the program. For me, the "perfect" solution would be to have a list of image files (e.g., created by 'ls *png > file'), run a script and --voila -- get a PPT file, each slide containing exactly one of the images filling the whole slide.
Finding the tools
A Google (re)search showed nothing on automatic generation of PPT files, since their internal structure is too obscure to reverse-engineer. However, I did find information on the inner structure of OpenOffice files. These are simply compressed XML files and the DTD of the XML structure is published. After fast-forwarding through the 571 pages I was seriously hoping somebody else might have done some work on this already. Indeed, adding my favorite programming language Perl to the Google search words produced some interesting links:
* The ooolib
* The OpenOffice::OODoc perl Module
I decided to give the OpenOffice::OODoc module a try. To get the modules installed on my system I went to CPAN (the Comprehensive Perl Archive Network) by typing as root ([...] signifies unimportant left-out parts):
# perl -MCPAN -e shell
cpan shell -- CPAN exploration and modules installation (v1.7601)
cpan> install OpenOffice::OODoc
CPAN: Storable loaded ok
[...]
Running install for module OpenOffice::OODoc
Running make for J/JM/JMGDOC/OpenOffice-OODoc-1.309.tar.gz
Fetching with LWP:
ftp://ftp.perl.org/pub/CPAN/authors/id/J/JM/JMGDOC/OpenOffice-OODoc-1.309.tar.gz
[...]
OpenOffice-OODoc-1.309/
OpenOffice-OODoc-1.309/OODoc/
[...]
Checking if your kit is complete...
Looks good
Warning: prerequisite XML::Twig 3.15 not found.
---- Unsatisfied dependencies detected during [J/JM/JMGDOC/OpenOffice-OODoc-1.309.tar.gz] -----
XML::Twig
Shall I follow them and prepend them to the queue
of modules we are processing right now? [yes]
[...]
Writing /usr/lib/perl5/site_perl/5.8.3/x86_64-linux-thread-multi/auto/OpenOffice/OODoc/.packlist
Appending installation info to /usr/lib/perl5/5.8.3/x86_64-linux-thread-multi/perllocal.pod
/usr/bin/make install -- OK
cpan> quit
The cpan shell realized that a prerequisite, the module XML::Twig, was missing for OpenOffice::OODoc. It offered to fetch it automatically, which I asked it to do. Both Twig and OODoc asked some questions during the install, and I selected the defaults every time. After that, I had OpenOffice::OODoc and all the prerequisites on my system and could start reading the documentation.
The code
After glancing at the Introduction and some man pages, I started checking out the examples that came with the module. I found it interesting to find a solution for the reverse of my problem -- extracting all images from an OpenOffice Document -- since there is another simple method to get the images out: Just run 'unzip' on the *.swi, *.sxw or any other OpenOffice document. They are nothing but a zip-file containing the content, meta files and (in the Pictures subdirectory) all images of the document.
unzip -t img2ooImpressExport.swi
Archive: img2ooImpressExport.swi
testing: mimetype OK
testing: content.xml OK
testing: meta.xml OK
testing: styles.xml OK
testing: settings.xml OK
testing: META-INF/manifest.xml OK
testing: Pictures/Seite_1.jpg OK
testing: Pictures/Seite_2.jpg OK
No errors detected in compressed data of img2ooImpressExport.swi.
After some digging around in the documentation I came up with the following Perl code:
use OpenOffice::OODoc;
# start a new document
my $document = ooDocument
(
file => 'outputfile.swi',
create => 'presentation'
);
$document->createImageStyle("slide");
# loop over image file names (open is outside of this snippet)
my $i=1;
while (my $imgfile=){
chomp($imgfile);
# start a new page/slide
my $page = $document->appendElement
('//office:body',0,'draw:page');
# include the image at full size
my $image= $document->createImageElement
(
"Slide".$i,
description => "image ".$i." filename:".$imgfile,
page => $page,
position => "0,0",
import => $imgfile,
size => "28cm, 21cm",
style => "slide"
);
$i++;
}
$document->save;
The complete script includes some file-handling and reads the image list. The hardest part was figuring out how to add a new page in an Impress presentation, since I could only find examples which modified a document (inserting an image after some text, etc.) but none that created new pages. Some digging in the other man-pages and the doc-folder (if you have 'locate' installed and ran 'updatedb' after the OODoc installation, 'locate OODoc' will find all of them) started me in the right direction.
In the created *swi file, the very first slide is left blank (named "First Page") and every other page contains one image, named consecutively "Slide N". I didn't bother figuring out how to drop that first page. Also OpenOffice seems to have changed the file extensions between versions: Impress files could be *.swi or *.sxi. The script takes either one or two arguments: The necessary file with the list of images (one filename each line) and an optional output filename (default img2ooImpressExport.swi). The images are automatically aligned, so animations in the PDF (each animation step is a new page) will play out smoothly in Impress/PowerPoint. That was one of the most annoying things with manually importing single images into PPT -- realigning and rescaling so the images won't jump from slide to slide.
If you would like to add a title or other text to the slides you could just modify the position and size specification in the createImageElement block. The page specification adds the image anchored to the page. The Info page gives an example where an image is anchored to a new paragraph. The Intro page is also accessible by:
man OpenOffice::OODoc::Intro
Other examples that came with the Perl module create spreadsheets from *.csv files ('oobuild') or create swriter files from text source ('text2ooo').
So what about PPT?
Impress is quite capable of opening and then exporting the created file to PPT-format, and PowerPoint will not even be able to turn Vector arrows into letters, much less mess about with text colors and other annoyances I see regularly at conferences.
How to convert from LaTeX/PDF files
A cheap way of converting a LaTeX-created PDF presentation (e.g., beamer, prosper or (limited in animation capabilities) TeXPower) is to convert every PDF page into an image and run the images through img2ooImpress.pl. The PDF-to-image conversion can be handled by ghostscript (gs). 'gs -h' will show some useful options and a list of formats it can export to. Look for pngalpha, png16m, and jpeg as useful image formats. If pngalpha (PNG with antialias rendering) is available, you can run something similar to:
gs -dNOPAUSE -g1024x768 -r205 -sDEVICE=pngalpha \
-sOutputFile=Talkimg_%d.png -dBATCH Talk.pdf
which creates 1024x768-sized, antialiased PNG images, consecutively numbered Talkimg_1.png to however many pages were in the PDF. The -r205 specifies the resolution which fits PDF files produced with 'pdflatex' and the beamer class. For other PDF files you will want to change the resolution so you fill the 1024x768 pixels as close as possible. 'gs' either pads with white or just clips your pages if the '-r' does not match the '-g' option. With pages rendered exactly at destination resolution no additional scaling will occur and the images should look good on the screen. Alternatively, you can generate the images too large and let Impress do the scaling (image size is set to full page in the script) so they will fit on the slide, e.g.:
gs -dNOPAUSE -r300 -sDEVICE=pngalpha \
-sOutputFile=Talkimg_%d.png -dBATCH Talk.pdf
which again is OK for a beamer-class PDF file as the slides are rather small. For a PDF document in A4 landscape, 300 dpi is way too high (-r specifies the dpi resolution for which 'gs' should render the image). Switching from page to page will get really slow if the images are much larger then the actual resolution.
ls -rt *_1024.jpg > imglist
img2ooImpress.pl imglist MyTalk.swi
then converts the images into an Impress file. Since the 'gs' image is generated without padding zeros, i.e. 1, 2, ..., 10, 11, ..., the "ls -rt *png" reverse sorts by file modification time and gets the page sequence right. For some reason, this doesn't work right on all systems and the file list is still not sorted properly. There are several methods to create the filenames with enough left hand zeros so they can be used in "alphabetic" order. If you have "mmv" installed, and you used a file name structure like File_[num].png, you could use:
mmv "*_?.png" "#1_0#2.png"
Here "mmv" will replace #1 with whatever the first wildcard matches and will add one zero left of every single digit number. #2 will become the number before the change. It's straightforward to extend this to more padding zeros. Another option is a Perl-based renaming script as in the perl cookbook or a slightly modified version which lets you test a regular expression until you tell it to actually do the rename by adding as first option "-x", i.e.:
rename.pl -x 's/(\d+)/sprintf "%03d", $1/e' Talkimg*png
which pads left-hand zeros so all images have three digits. A simple 'ls "Talkimg*png" > list' will now create the properly sorted list for 'img2ooImpress.pl'.
Drawbacks
The one major drawback is that any navigational link in the PDF ('beamer' can add a full clickable table of contents in a sidebar) is lost. Text changes can only be added in the Impress/PPT file by overlaying boxes to hide the original text and adding new text on top of that. However, with the suggested small modification in the size and position specification, you could still create preformatted pages which only need the additional title and/or text; then, you could easily choose Insert-NewSlide, insert-Graphics-fromFile, align and resize to fit, etc.
Other uses?
With the rise of digital cameras and the disappearance of color slides (do you still have some?), why not create an Impress or PowerPoint presentation from your latest holiday photos? (assuming Bash usage):
cd your/image/dir
for i in *jpg; do
convert -geometry 1024x768 $i `basename $i .jpg`_1024.jpg
done
ls *_1024.jpg > list
img2ooImpress.pl list img2ooImpressExport.swi
You could then run 'soffice img2ooImpressExport.swi' to see the resulting presentation.
The 'for' loop and 'convert' (from ImageMagick) scale the images down to 1024x768 (most beamers won't use anything larger) and you could throw in a gamma correction (-gamma x), watermark, added text, rotation, sharpening or whatever (see "man convert" for details).
©
Why would I want to use a command line tool to generate a WYSIWYG/GUI Document?
Well, I don't know about you, but I am not very keen on extremely boring and repetitive clicking through some GUI if the task at hand doesn't need individual care for every page done. Specifically, I had a presentation created with LaTeX/beamer in PDF format and "boss" wanted to have a PowerPoint file. Or you have your paper accepted at this great/important conference and you find out they will only accept PPT files for a talk: no PDF, no Impress, no own laptop.
So, I found myself importing a series of identically-sized images into a PowerPoint presentation once too often (i.e. once) and got extremely bored. So bored that I started investigating if I could do it any other way, even if it took me a week to write the program. For me, the "perfect" solution would be to have a list of image files (e.g., created by 'ls *png > file'), run a script and --voila -- get a PPT file, each slide containing exactly one of the images filling the whole slide.
Finding the tools
A Google (re)search showed nothing on automatic generation of PPT files, since their internal structure is too obscure to reverse-engineer. However, I did find information on the inner structure of OpenOffice files. These are simply compressed XML files and the DTD of the XML structure is published. After fast-forwarding through the 571 pages I was seriously hoping somebody else might have done some work on this already. Indeed, adding my favorite programming language Perl to the Google search words produced some interesting links:
* The ooolib
* The OpenOffice::OODoc perl Module
I decided to give the OpenOffice::OODoc module a try. To get the modules installed on my system I went to CPAN (the Comprehensive Perl Archive Network) by typing as root ([...] signifies unimportant left-out parts):
# perl -MCPAN -e shell
cpan shell -- CPAN exploration and modules installation (v1.7601)
cpan> install OpenOffice::OODoc
CPAN: Storable loaded ok
[...]
Running install for module OpenOffice::OODoc
Running make for J/JM/JMGDOC/OpenOffice-OODoc-1.309.tar.gz
Fetching with LWP:
ftp://ftp.perl.org/pub/CPAN/authors/id/J/JM/JMGDOC/OpenOffice-OODoc-1.309.tar.gz
[...]
OpenOffice-OODoc-1.309/
OpenOffice-OODoc-1.309/OODoc/
[...]
Checking if your kit is complete...
Looks good
Warning: prerequisite XML::Twig 3.15 not found.
---- Unsatisfied dependencies detected during [J/JM/JMGDOC/OpenOffice-OODoc-1.309.tar.gz] -----
XML::Twig
Shall I follow them and prepend them to the queue
of modules we are processing right now? [yes]
[...]
Writing /usr/lib/perl5/site_perl/5.8.3/x86_64-linux-thread-multi/auto/OpenOffice/OODoc/.packlist
Appending installation info to /usr/lib/perl5/5.8.3/x86_64-linux-thread-multi/perllocal.pod
/usr/bin/make install -- OK
cpan> quit
The cpan shell realized that a prerequisite, the module XML::Twig, was missing for OpenOffice::OODoc. It offered to fetch it automatically, which I asked it to do. Both Twig and OODoc asked some questions during the install, and I selected the defaults every time. After that, I had OpenOffice::OODoc and all the prerequisites on my system and could start reading the documentation.
The code
After glancing at the Introduction and some man pages, I started checking out the examples that came with the module. I found it interesting to find a solution for the reverse of my problem -- extracting all images from an OpenOffice Document -- since there is another simple method to get the images out: Just run 'unzip' on the *.swi, *.sxw or any other OpenOffice document. They are nothing but a zip-file containing the content, meta files and (in the Pictures subdirectory) all images of the document.
unzip -t img2ooImpressExport.swi
Archive: img2ooImpressExport.swi
testing: mimetype OK
testing: content.xml OK
testing: meta.xml OK
testing: styles.xml OK
testing: settings.xml OK
testing: META-INF/manifest.xml OK
testing: Pictures/Seite_1.jpg OK
testing: Pictures/Seite_2.jpg OK
No errors detected in compressed data of img2ooImpressExport.swi.
After some digging around in the documentation I came up with the following Perl code:
use OpenOffice::OODoc;
# start a new document
my $document = ooDocument
(
file => 'outputfile.swi',
create => 'presentation'
);
$document->createImageStyle("slide");
# loop over image file names (open is outside of this snippet)
my $i=1;
while (my $imgfile=
chomp($imgfile);
# start a new page/slide
my $page = $document->appendElement
('//office:body',0,'draw:page');
# include the image at full size
my $image= $document->createImageElement
(
"Slide".$i,
description => "image ".$i." filename:".$imgfile,
page => $page,
position => "0,0",
import => $imgfile,
size => "28cm, 21cm",
style => "slide"
);
$i++;
}
$document->save;
The complete script includes some file-handling and reads the image list. The hardest part was figuring out how to add a new page in an Impress presentation, since I could only find examples which modified a document (inserting an image after some text, etc.) but none that created new pages. Some digging in the other man-pages and the doc-folder (if you have 'locate' installed and ran 'updatedb' after the OODoc installation, 'locate OODoc' will find all of them) started me in the right direction.
In the created *swi file, the very first slide is left blank (named "First Page") and every other page contains one image, named consecutively "Slide N". I didn't bother figuring out how to drop that first page. Also OpenOffice seems to have changed the file extensions between versions: Impress files could be *.swi or *.sxi. The script takes either one or two arguments: The necessary file with the list of images (one filename each line) and an optional output filename (default img2ooImpressExport.swi). The images are automatically aligned, so animations in the PDF (each animation step is a new page) will play out smoothly in Impress/PowerPoint. That was one of the most annoying things with manually importing single images into PPT -- realigning and rescaling so the images won't jump from slide to slide.
If you would like to add a title or other text to the slides you could just modify the position and size specification in the createImageElement block. The page specification adds the image anchored to the page. The Info page gives an example where an image is anchored to a new paragraph. The Intro page is also accessible by:
man OpenOffice::OODoc::Intro
Other examples that came with the Perl module create spreadsheets from *.csv files ('oobuild') or create swriter files from text source ('text2ooo').
So what about PPT?
Impress is quite capable of opening and then exporting the created file to PPT-format, and PowerPoint will not even be able to turn Vector arrows into letters, much less mess about with text colors and other annoyances I see regularly at conferences.
How to convert from LaTeX/PDF files
A cheap way of converting a LaTeX-created PDF presentation (e.g., beamer, prosper or (limited in animation capabilities) TeXPower) is to convert every PDF page into an image and run the images through img2ooImpress.pl. The PDF-to-image conversion can be handled by ghostscript (gs). 'gs -h' will show some useful options and a list of formats it can export to. Look for pngalpha, png16m, and jpeg as useful image formats. If pngalpha (PNG with antialias rendering) is available, you can run something similar to:
gs -dNOPAUSE -g1024x768 -r205 -sDEVICE=pngalpha \
-sOutputFile=Talkimg_%d.png -dBATCH Talk.pdf
which creates 1024x768-sized, antialiased PNG images, consecutively numbered Talkimg_1.png to however many pages were in the PDF. The -r205 specifies the resolution which fits PDF files produced with 'pdflatex' and the beamer class. For other PDF files you will want to change the resolution so you fill the 1024x768 pixels as close as possible. 'gs' either pads with white or just clips your pages if the '-r' does not match the '-g' option. With pages rendered exactly at destination resolution no additional scaling will occur and the images should look good on the screen. Alternatively, you can generate the images too large and let Impress do the scaling (image size is set to full page in the script) so they will fit on the slide, e.g.:
gs -dNOPAUSE -r300 -sDEVICE=pngalpha \
-sOutputFile=Talkimg_%d.png -dBATCH Talk.pdf
which again is OK for a beamer-class PDF file as the slides are rather small. For a PDF document in A4 landscape, 300 dpi is way too high (-r specifies the dpi resolution for which 'gs' should render the image). Switching from page to page will get really slow if the images are much larger then the actual resolution.
ls -rt *_1024.jpg > imglist
img2ooImpress.pl imglist MyTalk.swi
then converts the images into an Impress file. Since the 'gs' image is generated without padding zeros, i.e. 1, 2, ..., 10, 11, ..., the "ls -rt *png" reverse sorts by file modification time and gets the page sequence right. For some reason, this doesn't work right on all systems and the file list is still not sorted properly. There are several methods to create the filenames with enough left hand zeros so they can be used in "alphabetic" order. If you have "mmv" installed, and you used a file name structure like File_[num].png, you could use:
mmv "*_?.png" "#1_0#2.png"
Here "mmv" will replace #1 with whatever the first wildcard matches and will add one zero left of every single digit number. #2 will become the number before the change. It's straightforward to extend this to more padding zeros. Another option is a Perl-based renaming script as in the perl cookbook or a slightly modified version which lets you test a regular expression until you tell it to actually do the rename by adding as first option "-x", i.e.:
rename.pl -x 's/(\d+)/sprintf "%03d", $1/e' Talkimg*png
which pads left-hand zeros so all images have three digits. A simple 'ls "Talkimg*png" > list' will now create the properly sorted list for 'img2ooImpress.pl'.
Drawbacks
The one major drawback is that any navigational link in the PDF ('beamer' can add a full clickable table of contents in a sidebar) is lost. Text changes can only be added in the Impress/PPT file by overlaying boxes to hide the original text and adding new text on top of that. However, with the suggested small modification in the size and position specification, you could still create preformatted pages which only need the additional title and/or text; then, you could easily choose Insert-NewSlide, insert-Graphics-fromFile, align and resize to fit, etc.
Other uses?
With the rise of digital cameras and the disappearance of color slides (do you still have some?), why not create an Impress or PowerPoint presentation from your latest holiday photos? (assuming Bash usage):
cd your/image/dir
for i in *jpg; do
convert -geometry 1024x768 $i `basename $i .jpg`_1024.jpg
done
ls *_1024.jpg > list
img2ooImpress.pl list img2ooImpressExport.swi
You could then run 'soffice img2ooImpressExport.swi' to see the resulting presentation.
The 'for' loop and 'convert' (from ImageMagick) scale the images down to 1024x768 (most beamers won't use anything larger) and you could throw in a gamma correction (-gamma x), watermark, added text, rotation, sharpening or whatever (see "man convert" for details).
©