0. Previous requeriments
$sudo apt install img2pdf
$sudo apt install ocrmypdf
1. Download images by script
for n in `seq 1 23`
do
sn=`printf "%02d" $n`
echo $sn
wget https://larepublica.cronosmedia.glr.pe/printed/2021/07/01/lima/pages/$sn.jpeg
done
2. Convert images to pdf (after executed download script by sh)
$img2pdf *.jpeg --output rep1.pdf #or
$convert *.jpeg rep2.pdf
$ocrmypdf rep1.pdf rep1_ocr.pdf #reduce size
Notes for convert command(change from none to read|write):
sudo vim /etc/ImageMagick-6/policy.xml
<policy domain="coder" rights="none" pattern="PDF" />
<policy domain="coder" rights="read|write" pattern="PDF" />
Bonus
[1] El Pueblo news url https://www.diarioelpueblo.com.pe/wp-content/uploads/2021/07/01-07-2021.pdf
No comments:
Post a Comment