V
V
Vlad Zaitsev2020-02-12 20:14:02
Python
Vlad Zaitsev, 2020-02-12 20:14:02

How to parse ms/s units in pyparsing and python?

There are strings: "1m57s", "17s520ms", "1m", "10s", "200ms" and so on. It is necessary to parse using pyparsing in minuses, seconds and milliseconds.

minutes = ( Word(nums, max=8) + Suppress(Literal("m")) + Suppress(CharsNotIn("s") or Empty()) ) ('minutes')
seconds = ( Word(nums, max= 8) + Suppress(Literal("s")) ) ('seconds')
mseconds = ( Word(nums, max=8) + Suppress(Literal("ms")) ) ('mseconds')
tried to do this, but either lines of the form "1m" are not caught, or the first part (400m) is caught on the 400ms line and is mistakenly taken as minutes.

Here is the code to check:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

from pyparsing import *

strings = {"1m57s", "17s520ms", "1m", "10s", "200ms"}

def parse_second(string):
  minutes = ( Word(nums, max=8) + Suppress(Literal("m")) + (Suppress(CharsNotIn("s") or Empty())) ) ('minutes')
  seconds = ( Word(nums, max=8) + Suppress(Literal("s")) ) ('seconds')
  time_minutes = minutes
  time_seconds = seconds
  string = string.rstrip()
  minutes_count = 0
  seconds_count = 0

  try:
    parsed_time_minutes = time_minutes.parseString(string)
    if (parsed_time_minutes.minutes != ""):
      minutes_count = int(parsed_time_minutes.minutes.asList()[0])
  except BaseException:
    pass

  try:
    parsed_time_seconds = time_seconds.parseString(string)
    if (parsed_time_seconds.seconds != ""):
      seconds_count = int(parsed_time_seconds.seconds.asList()[0])
  except BaseException:
    pass

  all_sec = (minutes_count*60) + seconds_count
  return all_sec

def parse():
  for element in strings:
    result = parse_second(element)
    print(element, result)

parse()


Result:

('1m57s', 60) ✗ Wrong, should be 60+57=117
('10s', 10) ✓ Correct
('1m', 0) ✗ Wrong, should be 60
('17s520ms', 17) ✓ Correct, whole seconds - 17
('200ms', 0) ✓ Correct, whole seconds - zero

Answer the question

In order to leave comments, you need to log in

2 answer(s)
V
Vlad Grigoriev, 2020-02-12
@Vaindante

I didn’t throw it quickly on a pure regular season, then do whatever you want with the results

import re
from dataclasses import dataclass


@dataclass
class Time:
    minute: int
    second: int
    ms: int

    @classmethod
    def init(cls, data):
        return cls(**{k: int(v) for k, v in data.items()})

    def to_seconds(self):
        return self.minute * 60 + self.second


strings = ["1m57s", "17s520ms", "1m", "10s"]
time_parse = re.compile(r"((?P<minute>\d+)(m(?!s)))?((?P<second>\d+)(s))?((?P<ms>\d+)(ms))?")

d = [Time.init(time_parse.match(v).groupdict(default='0')) for v in strings]

print(d)
print([v.to_seconds() for v in d])

[Time(minute=1, second=57, ms=0), Time(minute=0, second=17, ms=520), Time(minute=1, second=0, ms=0), Time(minute=0, second=10, ms=0)]
[117, 17, 60, 10]

F
Filart97, 2020-02-12
@Filart97

in the first line, the priority of the "or" operation is higher than that of "+", in theory, try to take the last operation in brackets

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question