G
G
ghostku2015-11-16 01:46:57
Python
ghostku, 2015-11-16 01:46:57

How to replace all spaces delimited by a specific character?

There is a piece of HTML document:

...
<img alt='fdsfsd' src='text1 text2'/>
adasd 
<img alt='sadsdd' src='text3 text4'/>
adasdas
...

You need to replace all spaces in the file path with an underscore and add the beginning of the path. The result should be.
...
<img alt='fdsfsd' src='CONST_text1_text2'/>
adasd 
<img alt='sadsdd' src='CONST_text3_text4'/>
adasdas
...

python language with standard re object

Answer the question

In order to leave comments, you need to log in

1 answer(s)
N
nirvimel, 2015-11-16
@ghostku

  1. You can't parse HTML with regular expressions .
  2. import re
    from lxml import etree
    doc = \
        """
        <body>
        <img alt='fdsfsd' src='CONST text1 text2'/>
        adasd
        <img alt='sadsdd' src='CONST text3 text4'/>
        adasdas
        </body>
        """
    tree = etree.fromstring(doc, parser=etree.HTMLParser())
    for img in tree.xpath('//img[@src]'):
        img.attrib['src'] = re.sub(r'\s+', '_', img.attrib['src'])

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question