loader image

Remove White Space from PDF file [Linux]

What makes us different from other similar websites? Forums Tech Remove White Space from PDF file [Linux]

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #8029
    thumbtak
    Keymaster

    I had scanned a file that was over 30 pages, and I wanted to find an easy way to crop all the pages to remove white space. This is how I did it.

    Note: Create a new, dedicated folder and place only the PDF file you intend to process inside. The corrected PDF, named ‘cropped.pdf’, will be saved to this folder upon completion.

    $ pdfimages -j input.pdf page
    $ mogrify -trim page-*.jpg
    $ mogrify -resize 1600x1200 page-*.jpg
    $ convert page-*.jpg cropped.pdf
    $ rm -rf *.jpg
    #8035
    thumbtak
    Keymaster

    Here is an SH script you can save, to help you do this, if you need to do this often.

    #!/bin/bash
    
    # Script to extract images from a PDF, process them, and create a new PDF.
    
    # Ask the user for the input PDF filename
    read -p "Enter the name of the input PDF file: " INPUT_PDF
    
    # Check if the input PDF file was provided
    if [ -z "$INPUT_PDF" ]; then
    echo "Error: No input PDF file specified."
    echo "Usage: $0"
    exit 1
    fi
    
    # Check if the input PDF file exists
    if [ ! -f "$INPUT_PDF" ]; then
    echo "Error: Input PDF file '$INPUT_PDF' not found."
    exit 1
    fi
    
    OUTPUT_BASE="cropped" # Default base name for output files
    
    echo "Extracting images from '$INPUT_PDF'..."
    pdfimages -j "$INPUT_PDF" page
    
    echo "Trimming whitespace from extracted images..."
    mogrify -trim page-*.jpg
    
    echo "Resizing images to a maximum of 1600x1200..."
    mogrify -resize 1600x1200 page-*.jpg
    
    # Ask the user for the desired output PDF filename
    read -p "Enter the desired name for the final PDF file (without extension): " OUTPUT_NAME
    
    if [ -z "$OUTPUT_NAME" ]; then
    FINAL_PDF="${OUTPUT_BASE}.pdf"
    echo "Using default output filename: '$FINAL_PDF'"
    else
    FINAL_PDF="${OUTPUT_NAME}.pdf"
    fi
    
    echo "Creating the final PDF: '$FINAL_PDF'..."
    convert page-*.jpg "$FINAL_PDF"
    
    echo "Removing temporary JPEG files..."
    rm -rf *.jpg
    
    echo "Processing complete. The final PDF is: '$FINAL_PDF'"

    How to use this script:

    1. Save the code: Save the code above into a file, for example, process_pdf.sh.

    2. Make it executable: Open your terminal and navigate to the directory where you saved the file. Then run the command:
    $ chmod +x process_pdf.sh

    3. Run the script: Execute the script without any arguments:
    $ bash process_pdf.sh

     

    The script will first ask you:

    Enter the name of the input PDF file:

    You will then need to type the name of your PDF file and press Enter. After processing the images, it will then ask you for the output file name as before:

    Enter the desired name for the final PDF file (without extension):

Viewing 2 posts - 1 through 2 (of 2 total)
  • You must be logged in to reply to this topic.
TAKs Shack