How to display the longest string of characters in alphabetical order?

D

Dmustache Usakov2021-05-16 19:52:27

Python

Dmustache Usakov, 2021-05-16 19:52:27

Task:
The file contains 100,000 capital Latin letters. Find a subsequence of consecutive letters that are arranged in non-decreasing alphabetical order (for example, in ABCABA these are the sequences A, B, C, A, B, A, AB, ABC, BC, AB). In your answer, indicate the longest of the sequences encountered in the file. If there are several sequences of this length, indicate the first one.
(This may include sequences like: ABBBCCCDEEFFFF )
File
My code:

with open('file.txt', mode='r', encoding='utf-8') as f:
    text = f.read()

lst = [chr(i) for i in range(ord('A'), ord('Z') + 1)]
abc = ''.join(lst) #алфавит

elems = [] #последовательности
for i in range(len(text) - 1):
    passer = False
    elems.append(text[i])
    j = 1
    while passer == False:
        if len(text) < i + j + 1:
            passer = True
        else:
            if ord(text[i + j]) - ord(elems[-1][-1]) < 2:
                elems[-1] += text[i + j]
            else:
                passer = True
        j += 1
print(max(elems, key=len))

I don't understand why character sequences like this one are passed - FCDBCDEFCDDDEDDBAAA

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

O

o5a, 2021-05-16
@Dmustache

The problem is because of the test condition in the sequence condition, it is required that the next letter be no less than the previous one, it is not clear why you suddenly wrote such a condition. In general, there is no need to save the sequences themselves somewhere, especially since only the first of the longest ones is required. How to do it: Iterate through the characters of the sequence and check if ord() of the current character has become less than the previous one. If it does, then the sequence is over. We check if this sequence has become larger than the previous one. If so, update the maximum sequence data (starting position and length). Otherwise, we move on to the next one. More or less like this:
if ord(text[i + j]) - ord(elems[-1][-1]) < 2

spoiler

text = ...
text_len = len(text)
max_len = 0
max_idx = 0
seq_idx = 0
prev = 0

for i, x in enumerate(text):
    if ord(x) < prev:
        seq_len = i-seq_idx
        if seq_len > max_len:
            max_len = seq_len
            max_idx = seq_idx
        seq_idx = i
    elif i==text_len-1:
        seq_len = i-seq_idx+1
        if seq_len > max_len:
            max_len = seq_len
            max_idx = seq_idx
    prev = ord(x)
print(max_len, text[max_idx:max_idx+max_len])

M

MinTnt, 2021-05-17
@MinTnt

Well, in principle, this can be solved through regex

import re
from string import ascii_uppercase as as_up

print(max(re.findall('*'.join(as_up)+'*', s), key=len))