M
M
Marchenko Oleg2015-05-26 21:27:24
Python
Marchenko Oleg, 2015-05-26 21:27:24

How to correctly use the binary search algorithm in Python?

Help the teapot, please.
The task is for the simplest algorithms, only loops are used from Python tools. It is clear that this can be done easier, but I want to understand binary search and then compare the running time of this algorithm with linear search.
It is required to count the number of capital letters in the text (file Source.txt). The code below hangs somewhere in the loops. How to fix it?

AllCapitalLetters = 'АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЫЭЮЯ' # Строка из всех прописных букв русского алфавита
CapitalInTEXT = []
myfile = open(r'c:\My_Scripts_Macros_and_Progs\Python\34\AnalyseOfText\Source.txt')
TEXT = myfile.readlines()  # Считывает содержимое файла, возвращает список строк
STR_TEXT = " ".join(TEXT)  # Сливает список в строку
k = 0  # Обнуляем переменные
i = 0  # первый элемент
j = len(AllCapitalLetters) - 1  # последний элемент
m = int((i + j) / 2)  # середина (приблизительно)
for k in STR_TEXT:
    while k != AllCapitalLetters[m] or i > j:
        m = int((i + j) / 2)  # найти новую середину
        if k > AllCapitalLetters[m]: # если значение больше серединного
            i = m + 1  # то сместить левую границу за середину
        else:  # иначе
            j = m - 1  # переместить правую границу до середины

    if i > j:
        None
    else:
        CapitalInTEXT.append(AllCapitalLetters[m])  # Добавить в список найденное значение

print(CapitalInTEXT)  # Вывести список
print(len(CapitalInTEXT))  # Вывести длину списка

PS The prototype of the algorithm is taken from here .

Answer the question

In order to leave comments, you need to log in

2 answer(s)
B
bobrovskyserg, 2015-05-26
@OMarchenko

I see several different questions:
How to calculate the number of capital letters in the text simply?
How to count the number of capital letters in a text quickly?
What the hell did I write and why is it not working?

#  просто и быстро
STR_TEXT = "Требуется произвести подсчет числа прописных букв в тексте."
AllCapitalLetters = set("АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ")
CapitalInTEXT = [c for c in STR_TEXT if c in AllCapitalLetters]
print(CapitalInTEXT)
print(len(CapitalInTEXT))

With Latin, I would roll this option:
STR_TEXT = "CapitalInTEXT = [c for c in STR_TEXT if 'A' <= c <= 'Z']"
CapitalInTEXT = [c for c in STR_TEXT if 'A' <= c <= 'Z']

This will not work with Cyrillic: in Unicode, for example
In [1]: ''.join(chr(i) for i in range(1025, 1072))
Out[1]: 'ЁЂЃЄЅІЇЈЉЊЋЌЍЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ'
different encodings.
Well, let's practice binary search:
AllCapitalLetters = 'ЁАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ'
# обрати внимание на порядок букв
# на неотсортированном AllCapitalLetters алгоритм виснет
CapitalInTEXT = []
for c in STR_TEXT:
    i = 0
    j = len(AllCapitalLetters) - 1
    while i <= j:
        m = (i + j) // 2
        cm = AllCapitalLetters[m]
        if c > cm:
            i = m + 1
        elif c < cm:
            j = m - 1
        else:
            CapitalInTEXT.append(cm)
            break

Instead of self-writing, you can use the standard library:
from bisect import bisect_left

AllCapitalLetters = 'ЁАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ'
CapitalInTEXT = []
for c in STR_TEXT:
    idx = bisect_left(AllCapitalLetters, c)
    if idx < len(AllCapitalLetters) and c == AllCapitalLetters[idx]:
        CapitalInTEXT.append(c)

In terms of speed, the simple version is 8 times better than the bisect version and 24 times better than the self-written one.
Binary search is useless for this task. And it is useful, for example, when searching for the nearest neighbors of the letter under test, which is not in AllCapitalLetters, for example. the letters "Ї".

L
Leo, 2015-05-26
@STLEON

Some kind of big binary search. Look at my omgit.ru/blog/binary-search

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question