R
R
romajke2017-11-13 13:07:59
C++ / C#
romajke, 2017-11-13 13:07:59

How to create a hash table using C?

I'm trying to figure out data structures, and now I'm in a stupor in front of hash tables.
The essence of the task is as follows: there is a large dictionary (~ 150k English words), you need to organize a quick search for a given word in it.
The first thing that comes to my mind is (to begin with, in order to understand the principles of work) to group words by the first letter. Accordingly, this will already speed up the search by 26 times (compared to searching through all the words).
I see all this as an array consisting of 26 linked lists. That is, we determine by the first letter of each word in which cell to attribute it, and add it to the list located in this cell. Accordingly, when we enter a word to search, we also determine by its first letter in which cell of the array to look for it, and we are already running through this list in search of the desired word.
Question one: do I understand correctly that this data structure where my words will be stored can be called a hash table, and the function that determines which array cell the next word belongs to can be called a hash function?
Question two: I theoretically understand how the algorithm should work, but I can’t implement it in any way. Already confused in all this work with memory, pointers, etc. Bit by bit I collect information in Google regarding dynamic lists and arrays of structures, but the picture does not add up yet :(
Tell me how to implement this and / or where can I read about creating such things?
PS I can attach the code of my attempts to implement this)
update:

My attempt to implement reading a dictionary into a hash table:
//максимальная длина слова в словаре
#define SIZE 45 

int hash_function (char* key);

//определяем структуру
typedef struct node
    {
        char *word1;
        struct node *next;
    } node;


//создаем хэш-таблицу, в которой будут хранится списки
node hashtable[26];


int main()
{
    char *text = "small";
    FILE *fp = fopen(text, "r");
    if (fp == NULL)
    {
        printf("Could not open %s.\n", text);
        return 1;
    }

    char word[SIZE];
    
    //инициализируем всю хэш-таблицу нулями, для того,
    //что бы далее была возможность проверить, пустая ячейка или нет
    for (int i=0; i < 26; i++)
    {
        hashtable[i].word1 = NULL;
        hashtable[i].next = NULL;
    }


    while(!feof(fp))
    {
        //считываем слово из словаря
        fscanf(fp,"%s",word);
        
        //определяем его место в массиве
        int k=hash_function(word);
        
        //если ячейка пуста - вставляем туда наше слово
        if(hashtable[k].word1==NULL)
        {
            
            hashtable[k].word1=(char*)malloc(sizeof(char) * SIZE);
            strcpy(hashtable[k].word1, word);
            hashtable[k].next=NULL;

        }
        else
        {
            // создать новый элемент
            node * new = malloc (sizeof (node));


            // инициализировать новый элемент
            new-> word1 = NULL;
            new-> next = NULL;

            // вставить новый элемент в голову списка
            new-> next = hashtable[k];
            hashtable[k] = new;
            strcpy(hashtable[k]->word1, word);

        }

  	}


    fclose(fp);

    return 0;
}

int hash_function (char* key)
{
  int hash = tolower (key [0]) - 'a';
  return hash;
}

Answer the question

In order to leave comments, you need to log in

2 answer(s)
J
jcmvbkbc, 2017-11-13
@romajke

Question one: do I understand correctly that this data structure where my words will be stored can be called a hash table, and the function that determines which array cell the next word belongs to can be called a hash function?

Yes.
Have you read it here?

R
res2001, 2017-11-13
@res2001

It would be easier to sort your dictionary once and save it in this form, search by binary search, without any hash tables and overhead. Will work faster than a hash table.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question