21 Apr
2009
21 Apr
'09
7:32 p.m.
On Apr 21, 2009, at 6:25 AM, Rui Carneiro wrote:
Anyone know some good libraries to handle the content of files like
pdf, ppt, doc, etc? I am already indexing attachments all I need now is
extract the text of them.
I've no idea, but you could at least look at some of the other full
text search engines. I remember them advertising indexing support for
all kinds of formats. Maybe they're using some specific library or
maybe it would be easy to extract their parsing code.