It would be nice if Groovy offered "raw strings" like Python--r'C:\Documents and Settings\...'--or C#, which lets you prepend a '@' to have backslashes treated literally--esp. when it comes to Windows pathnames, but whatever.
import java.util.zip.*
docx = new File('C:\\Documents and Settings\\tnassar\\My Documents\\Efficient.docx')
zip = new ZipFile(docx)
entry = zip.getEntry('word/document.xml')
stream = zip.getInputStream(entry)
// The namespace was gleaned from the decompressed XML.
wordMl = new XmlSlurper().parse(stream).declareNamespace(w: 'http://schemas.openxmlformats.org/wordprocessingml/2006/main')
// The outermost XML element node is assigned to the variable wordMl, so
// GPath expressions will start after that. To print out the concatenated
// descendant text nodes of w:body, you use:
text = wordMl.'w:body'.children().collect { it.text() }.join('')
println text
This will not work well for complex document formats (I can imagine that tables and such would be a disaster), but for me it was just enough.
2 comments:
smokeless cigarettes, electronic cigarette, e cigarette, electronic cigarette reviews, e cig reviews, smokeless cigarettes
her response buy replica bags online next good quality replica bags dig this buy replica bags online
Post a Comment