Remove White Space from PDF file [Linux]

Tagged: Cropped PDF, Linux, PDF, Remove White Space PDF, Scanned PDF

This topic has 1 reply, 1 voice, and was last updated 17 hours, 11 minutes ago by thumbtak.

Viewing 2 posts - 1 through 2 (of 2 total)

Author

Posts

March 30, 2025 at 3:55 pm #8029

Keymaster

I had scanned a file that was over 30 pages, and I wanted to find an easy way to crop all the pages to remove white space. This is how I did it.

Note: Create a new, dedicated folder and place only the PDF file you intend to process inside. The corrected PDF, named ‘cropped.pdf’, will be saved to this folder upon completion.

$ pdfimages -j input.pdf page
$ mogrify -trim page-*.jpg
$ mogrify -resize 1600x1200 page-*.jpg
$ convert page-*.jpg cropped.pdf
$ rm -rf *.jpg

April 4, 2025 at 9:24 pm #8035

thumbtak

Keymaster

Here is an SH script you can save, to help you do this, if you need to do this often.

#!/bin/bash

# Script to extract images from a PDF, process them, and create a new PDF.

# Ask the user for the input PDF filename
read -p "Enter the name of the input PDF file: " INPUT_PDF

# Check if the input PDF file was provided
if [ -z "$INPUT_PDF" ]; then
echo "Error: No input PDF file specified."
echo "Usage: $0"
exit 1
fi

# Check if the input PDF file exists
if [ ! -f "$INPUT_PDF" ]; then
echo "Error: Input PDF file '$INPUT_PDF' not found."
exit 1
fi

OUTPUT_BASE="cropped" # Default base name for output files

echo "Extracting images from '$INPUT_PDF'..."
pdfimages -j "$INPUT_PDF" page

echo "Trimming whitespace from extracted images..."
mogrify -trim page-*.jpg

echo "Resizing images to a maximum of 1600x1200..."
mogrify -resize 1600x1200 page-*.jpg

# Ask the user for the desired output PDF filename
read -p "Enter the desired name for the final PDF file (without extension): " OUTPUT_NAME

if [ -z "$OUTPUT_NAME" ]; then
FINAL_PDF="${OUTPUT_BASE}.pdf"
echo "Using default output filename: '$FINAL_PDF'"
else
FINAL_PDF="${OUTPUT_NAME}.pdf"
fi

echo "Creating the final PDF: '$FINAL_PDF'..."
convert page-*.jpg "$FINAL_PDF"

echo "Removing temporary JPEG files..."
rm -rf *.jpg

echo "Processing complete. The final PDF is: '$FINAL_PDF'"

How to use this script:

1. Save the code: Save the code above into a file, for example, process_pdf.sh.

2. Make it executable: Open your terminal and navigate to the directory where you saved the file. Then run the command:
$ chmod +x process_pdf.sh

3. Run the script: Execute the script without any arguments:
$ bash process_pdf.sh

The script will first ask you:

Enter the name of the input PDF file:

You will then need to type the name of your PDF file and press Enter. After processing the images, it will then ask you for the output file name as before:

Enter the desired name for the final PDF file (without extension):

Author

Posts

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.