You are not logged in.

#1 2010-05-31 16:58:48

panosk
Member
From: Athens, Greece
Registered: 2008-10-29
Posts: 241

python and xpath

Hello,

I am writing a python program which imports xml files into sqlite databases. The problem is that some text in the xml files contains internal tags which I want to keep, but my program only imports the text up to the first occurence of the internal tag. Also, if the text starts with an internal tag, python just imports an empty record. What I actually want is to store all the text, including internal tags, of the <seg> element.

Here is the relevant part of my program:

doc = etree.parse(xmldoc)
print "Starting importing file into the sqlite database...."

for tu in doc.xpath("//tuv"):
    if tu.xpath('@xml:lang="el"'):
        trgtext = tu.findtext(".//seg")
    elif tu.xpath('@xml:lang="en-us"'):
        srctext = tu.findtext(".//seg")
        cur.execute('INSERT INTO project (source,target) VALUES (?,?)', (srctext,trgtext))

And here is a relevant part of the xml file:

 <tu
         tuid="19"
         datatype="Text"
         srclang="en-us"
      >
         <prop type="x-Client">41</prop>
         <prop type="x-Domain">1123</prop>
         <prop type="x-Project">2683527</prop>
         <tuv
            xml:lang="el"
            creationdate="20040513T173503Z"
            creationid="Panos"
         >
            <seg>Λήψη βοήθειας</seg>
         </tuv>
         <tuv
            xml:lang="en-us"
            creationdate="20040513T173503Z"
            creationid="Panos"
         >
            <seg>Getting Help</seg>
         </tuv>
      </tu>
      <tu
         tuid="20"
         datatype="Text"
         srclang="en-us"
      >
         <prop type="x-Client">41</prop>
         <prop type="x-Domain">1123</prop>
         <prop type="x-Project">2683527</prop>
         <tuv
            xml:lang="el"
            creationdate="20040513T173551Z"
            creationid="Panos"
         >
            <seg>Η Microsoft<ph x="1">{1}</ph>®<ph x="2">{2}</ph> <ph x="3">{3}</ph>Access 2003 είναι ένα πρόγραμμα βάσεων δεδομένων που σας επιτρέπει να αποθηκεύετε και να διαχειρίζεστε μεγάλες συλλογές πληροφοριών.</seg>
         </tuv>
         <tuv
            xml:lang="en-us"
            creationdate="20040513T173551Z"
            creationid="Panos"
         >
            <seg>Microsoft<ph x="1">{1}</ph>®<ph x="2">{2}</ph> <ph x="3">{3}</ph>Access 2003 is a database program that allows you to store and manage large collections of information.</seg>
         </tuv>
      </tu>

Thanks in advance for any suggestions.

Offline

Board footer

Powered by FluxBB