I used Mac as my primary OS for years, and have accumulated a plethora of .pages files. I was wondering if anyone knew the simplest and easiest method for converting those to .txt or something else. I was hoping for something that can handle batches of files[there are a lot of them] and could run on Arch or Win7. Thanks
From information given on wikipedia, http://en.wikipedia.org/wiki/Pages#Compatibility, this method may work. No guarantees.
Try copying/renaming one of the '.pages' file to a '.zip' extension. You may find a '.pdf'. or .'jpg' file inside if the files have been saved with previews enabled. You should also find an xml file which will have some form of the actual text.
If needed, two tools that may work to convert the xml to plain text are 'xmlto' in extra and 'xml2' in community. I have experience with neither.
It doesn't sound hard to write a script for this. Someone has probably done it before, but I couldn't find an example.
I had success by changing .pages to .zip, unzipping, then inside Quick folder there is a .pdf with text, and doing a pdftotext command output pure text of original document. Sometimes .pages uncompress different though, and there is no pdf. In my case there was still a .jpg though, and I used tesseract to OCR the text from image after converting .jpg to .tif with image magick.