Software To Create Searchable Pdf

Active1 year, 7 months ago

Is there any freeware OCR software (for Linux and/or Windows) that can take a PDF scanned document as input and output a Searchable PDF like Adobe Acrobat does?

With searchable PDF I meant that the OCRed text is invisible over the original text and can be selected with the mouse and copied.

I know that gscan2pdf on Linux can do something like this, but the text is placed in the top left corner of the page and is way too small, not at all synchronized with the text on the background scanned page. This because gscan2pdf feeds the whole page to an OCR engine. It should decompose the image in small images with single lines of text or small paragraphs to send to OCR software.

Nicolas Raoul

Searchable Image (Compact) to make the text in the PDF document searchable but not editable and to compress its graphics. Select this setting if you’re processing a document whose text requires searching without editing and that also contains a fair number of graphic images that need compressing. Now called ScanToPDF OCR Edition (Formerly ScanToPDF OCR Searchable PDF) create searchable PDF documents using the ScanToPDF OCR Edition. Scanned documents are usually produced as images, which means you cannot search the text. Also, with its searchable PDF Creator, users can create searchable PDF from images by using Enolsoft. Download and install Enolsoft PDF Converter with OCR on your mac. Add images to the program by drag&drop. Prepare for OCR to create seachable PDF from images on mac.

♦

11.6k9 gold badges43 silver badges116 bronze badges

CorneliusCornelius

4,1151 gold badge17 silver badges39 bronze badges

10 Answers

A tool that lets you do that is PDF-XChange Viewer. The free version will allow you to OCR your document in a variety of languages (you can download additional language packs for free) and add the OCR'd text as an overlay text layer you can copy from and search with CTRL+F.

fast PDF viewer with a lot of features
fast OCR engine (unless you choose the best accuracy)
a lot of options have the PRO icon next to them (available only on the Pro version) but you can hide them
color management and custom screen DPI settings
Windows only application, which doesn't seem to work on Wine (the viewer works, but the OCR function makes it crash)

What it doesn't:

the OCR doesn't take advantage of multiple cores
OCR doesn't detect character styles (bold, italic) or the copy function loses them
it doesn't use correct Romaniandiacritics, but than can be fixed if you copy text in an editor and do a search and replace:

Cornelius

4,1151 gold badge17 silver badges39 bronze badges

Guido DomeniciGuido Domenici

Try pdfsandwich. From the man-page:

pdfsandwich generates 'sandwich' OCR pdf files, i.e. pdf files which contain only images (no text) will be processed by optical character recognition (OCR) and the text will be added to each page invisibly 'behind' the images.

pdfsandwich is a command line utility. https://goltrace.netlify.app/siemens-drivers-download.html. If you have a scanned pdf file, for instance this one: alice.pdf (which is the first chapter of a novel you might have heard of), invoke pdfsandwich like this:

This will generate a file alice_ocr.pdf Hitman 2 free download for windows 7. which looks like the orginal file, but the recognized text will be placed behind the scanned images. You can make full text searches now or select text areas.

Another option might be OCRmyPDF.

studentstudent

The newer version of Tesseract (3.03 RC at the time of writing this) can do this:

free, opensource and cross-plarform
starting from version 3.03 PDF output is available
CLI software
multiple languages support
unfortunately, single image input, so to make a complete document, one must create a batch script to convert each page image to searchable PDF. After that PDF pages should be combined to a single PDF using tools like pdftk.

This is the command:

Cornelius

4,1151 gold badge17 silver badges39 bronze badges

pypdfocr is what worked for me. It is a Python script streamlining the whole Tesseract usage. After getting dependencies installed (on Linux it's a much simpler process) it's as simple as typing:

pypdfocr myfile.pdf

Dec 17, 2018 Release full games here. Here's where you can release any compilations you've made. The more the merrier. https://luckyphilly.netlify.app/mugen-killer-instinct-full-game.html.

And opening myfile_ocr.pdf a while later.

ZarothZaroth

I use Microsoft OneNote as OCR tool. On Right click against an image It can copy the entire text in images and It also has the capability to search text with in image. It is free and accurate and runs on windows and support almost all image formats.

It can also search through PDF files, and Images in PDF files.

Bonus point is that it supports multiple languages :) English, French, Spanish also

The admin personnel also have to look manually on which day which person will take the charge within library to manage the overall work. Visual basic source code download. The project library management system is capable to store all the information in the database from where user will place their query and get the results on the basis of their query. Only valid users will be able to access this Library Management System. For receiving book they have to show their library card and wait in line for their turns.

BarathVutukuriBarathVutukuri

https://www.microsoft.com/en-us/store/p/leadtools-ocr/9wzdncrdr0d5 is a small simple WinRT app (runs fine on Win10 as well) that does nothing more than take an image or pdf and output a sandwich PDF or text. It's kinda ugly and has absolutely no configuration, but it does this one small task perfectly.

James PolleyJames Polley

Free Searchable Pdf Software

You can get searchable text using Google Drive.

First, choose a key setting. Under 'general' in your Google Drive settings, check the box next 'Convert uploads: Convert uploaded files to Google Docs editor format.'

Now upload the pdf to your Google Drive (click 'new', then 'file upload'). When the upload is complete (might take a minute or two), right click it. (If you have trouble finding it, try hitting 'Recent' in the left-hand sidebar.) As I was saying, right-click the pdf you uploaded, and choose 'Open with.. Google Docs'. Now you will have searchable text.

aparente001aparente001

While the other answers on this thread focus on desktop software, I've had a lot of success with this webservice: http://www.searchablepdfs.org/

It allows you to upload a PDF of a scanned document, and it generates a 'sandwich PDF' with embedded OCR text that you can copy/paste.

Pros:

Fast
High quality OCR text recognition (the results I've gotten have been at least as good as what I've been able to get from using tesseract, which Cornelius mentioned)
Cross-platform (it's a web application so you don't need to install any software yourself)
Free

Cons:

Only supports English documents
Only processes up to 10 pages per file

calvinyoungcalvinyoung

Another option is pdf2pdfocr (https://github.com/LeoFCardoso/pdf2pdfocr) that is based on Tesseract-OCR and can run natively on Windows, MacOS and Linux operating systems.

Not limited by the built-in device memory, the Razer Synapse 2.0 is equipped with the Razer hardware brain to the cloud and allows you to have some profiles, configurations, and enable more and more complicated macros.INTER-DEVICE COMMUNICATIONSVoIP, Instant Messenger, and IRC-Chat clients can be combined into one. Razer game booster mac. There is no other device configuration because you can easily pull it from the cloud.INFINITE SETTINGS & PROFILEEach game is different and comes with its own set of control requirements.

Software To Create Searchable Pdf Files

Disclaimer: I'm the pdf2pdfocr developer.

Leo CardosoLeo Cardoso

Software To Create Searchable Pdf Download

Two more options:

1) Online: www.sandwichpdf.com

Create Searchable Pdf Free

2) Desktop (multiple OSes): NAPS2 - https://www.naps2.com/

kpkkpk

Software To Create Searchable Pdf In Adobe Acrobat

Not the answer you're looking for? Browse other questions tagged windowsgratislinuxpdfocr or ask your own question.

Create Searchable Pdf Files

Description

Adjusts the overall image lightness and darkness. This setting is available only when the Image Type is set to Color or Grayscale.

Adjusts the difference between the light and dark areas of the overall image. This setting is available only when the Image Type is set to Color or Grayscale.

Turn on to make the edges of image areas clearer for an overall sharper image. Turn off to leave softer edges. This setting is available only when the Image Type is set to Color or Grayscale.

Removes the rippled pattern that can appear in subtly shaded image areas, such as skin tones. Also improves results when scanning magazine or newspaper images that include screening in their original print process. This setting is available only when the Image Type is set to Color or Grayscale.

Enhances text recognition when scanning text documents.

Adjusts the level at which the black areas in text and line art are delineated, improving text recognition in OCR programs. This setting is available only when the Image Type is set to Black&White.