Originally published April 3, 2018 @ 2:07 pm

The other day I ran into the “Flexible Import/Export” article by Bruce Byfield in the March 2018 issue of Linux Pro Magazine and thought it could use some more detail. So here’s some more detail.

The unoconv utility is a part of LibreOffice. All examples below were ran on RHEL 7.3. The first step is to get the latest version of LibreOffice. This is not necessary, but may save you some time and aggravation.

Remove any existing installations of LibreOffice and install the latest stable release from one of the project’s mirrors:

yum -y remove openoffice* libreoffice*
cd && v="6.1.1" && wget http://ftp.utexas.edu/libreoffice/libreoffice/stable/${v}/rpm/x86_64/LibreOffice_${v}_Linux_x86-64_rpm.tar.gz
tar xfz LibreOffice_${v}_Linux_x86-64_rpm.tar.gz
cd LibreOffice_${v}*_Linux_x86-64_rpm/RPMS
yum -y install *rpm

Now, download a more up-to-speed version of unoconv and replace the one that came with your LibreOffice installation. Once again, this is not necessary, but is a good idea.

cd && git clone https://github.com/dagwieers/unoconv.git
/bin/cp -pf unoconv/unoconv /usr/bin

Add a startup file for the unoconv listener and add an appropriate selinux rule, if your system is using selinux.

cat << EOF > /etc/systemd/system/unoconv.service
[Unit]
Description=Unoconv listener for document conversions
Documentation=https://github.com/dagwieers/unoconv
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=simple
Environment="UNO_PATH=/usr/lib64/libreoffice/program"
ExecStart=/usr/bin/unoconv --listener

[Install]
WantedBy=multi-user.target
EOF

systemctl enable unoconv.service
systemctl start unoconv.service

f=/etc/sysconfig/selinux
if [ -f "${f}" ] && [ "$(grep -oP "(?<=^SELINUX=)[a-z]{1,}(?=$)" "${f}")" != "disabled" ]; then
setsebool -P httpd_execmem on
fi

Now with the installation out of the way, here come the examples.

# Convert DOCX to PDF
i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" "${i}.docx"

# Convert DOCX to password-protected PDF
i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" -e EncryptFile=true -e DocumentOpenPassword=admin123 "${i}.docx"

# Convert pages 2-3 of DOCX to PDF that cannot be printed unless permissions are unlocked using a password
i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" -e EncryptFile=true -e Printing=0 -e RestrictPermissions=true -e PermissionPassword=admin123 -e PageRange=2-3 "${i}.docx"

# Convert multiple Word documents in the current directory to PDF
find . -maxdepth 1 -mindepth 1 -type f -regextype posix-extended -regex '^.*\.(docx|doc)$' | while read i; do unoconv -f pdf -o "./output/${i}" "${i}" 2>/dev/null; done

# Convert DOCX to multiple JPG
i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" "${i}.docx"
j=$(strings < "./output/${i}.pdf" | sed -n 's|.*/Count -\{0,1\}\([0-9]\{1,\}\).*||p' | sort -rn | head -n 1)
k=1
while [ ${k} -le ${j} ]; do
unoconv -f pdf -o "./output/${i}_page_${k}.pdf" -e PageRange=${k}-${k} -e UseLosslessCompression=true "./output/${i}.pdf"
unoconv -f jpg -o "./output/${i}_page_${k}.jpg" -e Quality=94 "./output/${i}_page_${k}.pdf"
/bin/rm "./output/${i}_page_${k}.pdf"
(( k = k + 1 ))
done

# Convert DOCX to multiple JPG of specified resolution and dimensions
# Requires 'convert' utility: 
# yum -y install ImageMagick
i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" "${i}.docx"
j=$(strings < "./output/${i}.pdf" | sed -n 's|.*/Count -\{0,1\}\([0-9]\{1,\}\).*||p' | sort -rn | head -n 1)
k=1
while [ ${k} -le ${j} ]; do
unoconv -f pdf -o "./output/${i}_page_${k}.pdf" -e PageRange=${k}-${k} -e UseLosslessCompression=true "./output/${i}.pdf"
convert -density 400 "./output/${i}_page_${k}.pdf" -resize 2000x1500 "./output/${i}_page_${k}.jpg"
/bin/rm "./output/${i}_page_${k}.pdf"
(( k = k + 1 ))
done

# Convert XLSX to CSV
# Limitation: only the first sheet is converted
i="Spreadsheet Name"; unoconv -f csv -d spreadsheet -o "./output/${i}.csv" "${i}.xlsx"

You can find additional options for the unoconv utility’s PDF import/export functionality here. There was some talk about adding a command-line option to unoconv to allow the user to specify the sheet name or number during the conversion of a multi-sheet spreadsheet.

I don’t know if anything came out of this. I was not able to find a version of unoconv with this capability. So not to leave this question unanswered, here’s how you can use xlsx2csv tool to work with multi-sheet spreadsheets.

# Convert XLSX to CSV using xlsx2csv
# https://github.com/dilshod/xlsx2csv
# Install xlsx2csv
cd && git clone https://github.com/dilshod/xlsx2csv.git && cd xlsx2csv && /bin/cp -p xlsx2csv.py /usr/bin/xlsx2csv

# Convert sheets 1-10, remove empty or non-existent sheets
for j in `seq 1 10`; do xlsx2csv -s ${j} ${i}.xlsx "./output/${i}_sheet_${j}.csv" 2>/dev/null; if [ $? -ne 0 ] || [ ! -s "./output/${i}_sheet_${j}.csv" ]; then /bin/rm -f "./output/${i}_sheet_${j}.csv"; fi; done

# Convert all sheets. This will create a subfolder with CSV files named after every sheet
xlsx2csv -a ${i}.xlsx "./output/${i}"