B
B
Bruceee2017-08-07 01:06:42
Python
Bruceee, 2017-08-07 01:06:42

How to implement a search for occurrences of different lengths in a string from larger to smaller?

Please tell me how to implement the following:
there are three pairs of values ​​in the form of a dict dictionary. In the input string, you need to search and replace substrings from the dictionary in the following logic:("a b c"; "x") , ("a b"; "y"), ("a", "z")

  • if the string contains a substring "a b c", then you need to replace the occurrence "a, b, c"with "x", according to the first pair
  • if the string contains a substring "a b", then you need to replace the occurrence "a, b"with "y", according to the second pair
  • if the string contains only "a", then it should be replaced by "z", according to the third pair

That is, if there are long sequences that have an occurrence in the string, then first replace them, and so on until the shortest ones.
So far, I see only an option to make three dictionaries - with long sequences, with medium ones and with the shortest ones, and go through these dictionaries in turn. But the length can be longer than in the example, so I would like to find a smarter search to use only one dictionary.
There is also an idea to use a new data structure: firstly, ordered by the length of the sequences, secondly, storing both pairs of values ​​from the dictionary, then, accordingly, the search would go from the longest to the shortest.
Tell me, please, what data structure and how best to use?
How to store the new data structure? It is easy to store a dictionary as text in a separate file, is it possible to do this with the created data structure?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
L
longclaps, 2017-08-07
@Bruceee

import re

d = {"a b c": "x", "a b": "y", "a": "z"}
data = "a b c a b a a a b c"
print(re.sub(r'a( b( c)?)?', lambda m: d[m.group()], data))
print(re.sub(r'a b c|a b|a', lambda m: d[m.group()], data)) # можно и так

A
Alexander, 2017-08-07
@fireSparrow

The collections module has an OrderedDict structure, which is a dictionary that stores the order of elements.
If you fill it with key-value pairs in descending order of key length, then this is how it will store them, and when iterating, it will give long keys first.
The only thing to remember is that after such a dictionary is built, if necessary, add a new pair, the dictionary will need to be completely rebuilt.

M
maxfox, 2017-08-07
@maxfox

You need to store the replacement rules not in a dictionary, but, for example, in a list of tuples. The dictionary does not make sense in this problem, because you will iterate over all replacement options, and not select them by key. As a result, we go through the list of tuples and make a replacement via .replace(). Two lines of code.

X
xmoonlight, 2017-08-07
@xmoonlight

1. Sort the dictionary by the number of words: the longest chains are up.
2. Replace in the usual way: line by line, checking the current line for all possible matches by the number of words, etc.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question