[Phpwm] Full-text content searching

Tim Williams T.M.Williams at cs.bham.ac.uk
Mon Dec 8 17:02:22 UTC 2008


On Mon, 8 Dec 2008, Ian Munday wrote:

> Does anyone have any experience of performing full-text content
> searches within Microsoft Office documents (in particular Word and
> Excel) and PDF files using PHP?  If so, could they recommend a
> solution?  (The files themselves are currently stored within a MySQL
> database, although they could be moved out if need be.)

The global search function in Moodle uses pdf2text and antiword to get a 
text dump of such files and then uses this in a lucene search index. If 
you want to keep everything in the database, you could use a similar 
approach to create a text version of the file which is then inserted into 
the mysql database as an indexable text field.

I've also used openoffice as a server process to automatically convert 
uploaded documents between formats. That was was written using java 
though, not sure if there are any PHP classes for the OpenOffice api.

Tim W

-- 
Tim Williams BSc MSc MBCS - Euromotor Autotrain
Web : http://www.autotrain.org
Tel : +44 (0)121 414 2214 (ext 42214 on internal phone)



More information about the Phpwm mailing list