Y
Y
YoungSkipper2012-09-25 22:46:33
Python
YoungSkipper, 2012-09-25 22:46:33

Still, sscanf in Python - or parsing simple strings?

I study python, I write a small utility. You need to parse simple strings of text data. I want something simple so that I can write this kind of pseudo code
(full_name, age, coof1, coof2) = "Name Surname 25 1/2".sscanf("{str} {int} {int}/{int}")
(coof1 ,coof2) = "1/2".sscanf("{int}/{int}")
(date, place) = "8:30 Place".sscanf("{datetime} {str}")
Well, if not parse, then somehow get an error - they say it didn’t match ...
Is there something similar?
If not, then how to do it right, as I understand it, there are the following options
1. Use a module that emulates sscanf - not figs, not declarative and not functional somehow :)
2. Use regexps - you need to remember the syntax, and so on a lot of code, create a regexp, compile,
3. Anything else?

Answer the question

In order to leave comments, you need to log in

5 answer(s)
L
leventov, 2012-09-25
@leventov

import re

def scan_compile(pattern):
    pattern = pattern.replace('{str}', '(.+?)')
    return re.compile(pattern.replace('{int}', '(\d+)'))

def scan_match(r, s):
    match = r.match(s)
    return [int(g) if g.isdigit() else g
            for g in match.groups()]

>>> from scan_match import *
>>> r = scan_compile("{str} {int} {int}/{int}")
>>> scan_match(r, 'Mary Rose Jesus 12 24/32')
['Mary Rose Jesus', 12, 24, 32]

P
Pavel Tyslyatsky, 2012-09-25
@tbicr

You can look in the direction of pyparsing and similar text parsing utilities (parsers).

S
Sergey Lerg, 2012-09-25
@Lerg

split()?

Y
YoungSkipper, 2012-09-25
@YoungSkipper

O! Regexes, but in nicely wrapped. Actually, the libraries that emulate sscanf do about the same thing, for example - code.activestate.com/recipes/502213-simple-scanf-implementation/

Y
YoungSkipper, 2012-09-26
@YoungSkipper

But in general, something like this will do, everything is different - it’s easier to control if anything ... The only open question is what to do with complex data types - such as float or datetime.
For float, it can be taken from the example - ""([-+]?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][-+]?\d+ )?)" - but the same thing :)
Yes, and with int there is a problem - such an option if the line contains 0xAB is no longer suitable - in my particular case it is not necessary ... But how not to forget later, otherwise small utilities tend to grow: )

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question