L
L
loly2016-04-11 00:06:55
Database
loly, 2016-04-11 00:06:55

How to properly create an index in ElasticSearch?

For better perception, I took a completely different topic and renamed the fields.
Example:

"компании" : 
[{
   "название" : "значение",
   "hash" : "УникальноеЗначение",
   ...
   "работники" : [
         "должность" : ["ИмяРаботника", "ИмяРаботника", "ИмяРаботника"],
         "должность" : "ИмяРаботника",
         ...
         "должность" : "ИмяРаботника"
      ],
   "ключ" : "значение"
}, и т.д. идентичные предыдущему объекту объекты]

In total, there are 1000 different positions, but this does not mean that they will all be in 1 company. All companies will have no more than 15 positions (this limit does not need to be strictly defined if it does not affect performance). (Sounds a little silly in this context, but you can’t get rid of it)
and, accordingly, the creation of indexes, on which there will be questions
client.indices.create({
  index: "компании",
  body: {
    "mappings": {
      "компания": {
        "properties": {
          "навание": {
            "type": "string"
          },
          "hash": {
            "type": "string"
          },
          ...
          "работники": {
            ???
          }
        }
      }
    }
  }
}, function (error, response) {
  var body = GetData();
  client.bulk({
    body: body
  }, function (err, resp) {
    res.render('index', {result: 'Indexing Completed!'});
  })
});

Actually questions:
1) When adding a new company, there should be a check for the existence of a similar one in the "hash" field. Replace all fields if such company already exists. If it doesn't exist, then add it. The field is pre-generated and cannot be changed. How to implement it?
2) How to implement a list of employees with such a large number of different keys (i.e. positions)?
How can I do this:
1) Just delete the data on the "hash" field. In this case, there will be ~ 10 deletions per second. It seems to me that this is the wrong way, which will clearly affect the performance (maybe I'm wrong and this is how it should be done?).
2) Use something like {"type": "string"} : {"type": "string"} (does not work, although I did not expect this code to work, but the check did not take more than 1 minute).
PS In 99% of cases, the search will be performed by employees (perhaps even by all 15).

Answer the question

In order to leave comments, you need to log in

1 answer(s)
I
Igor Makarov, 2016-04-11
@loly

1. You do not need to check anything, if the data is not needed, the request to update the entire document is similar to the request to add. The query below will either create a new document or completely replace the existing one.

PUT /компании/компания/{_id}
{
    "навание": "SpaceX",
    "работники":  ...,
}

If "hash" is unique for each dig, it can be used as _id
PUT /компании/компания/{hash}
{
    ...
}

In earlier versions (before 1.5) it was possible to use an alias for the _id field, which could be auto-generated:
"mappings": {
    "компания": {
        // в текущей версии: 2.3 depricated - сказывалось на производительности
        "_id": {"path": "hash"},
        "properties": {
            "навание": {
                "type": "string"
            },
            ...
        }
    }
}

2. In elastic, there are no arrays as such, there are nested objects. Any field in a document can contain multiple values, but the values ​​must be of the same type . The type can be nested or object, nested allows for a more convenient search when there are many nested objects.
If I understand correctly, and the positions are different, it will be more convenient to use nested. Otherwise object.
"mappings": {
    "компания": {
        "properties": {
            "работники": {
                "type": "nested",
                "properties": {
                    "должность": {
                        "type": "string"
                    },
                    "имя": {
                        "type": "string"
                    }
                }
            }
        }
    }
}

// создание/обновление
PUT /компании/компания/{_id}
{
    "название": "...",
    "hash": "...",
    "работники": [
        {
            "должность": "манагер",
            "имя": ["Анатолий", "Андрей"]
        },
        {
            "должность":  ["управляющий", "заместитель"]
            "имя": "Дмитрий"
        },
        {
            "должность": "кассир",
            "имя": ["Татьяна", "Анастасия"]
        },
    ]
}

// примерный поиск
GET /компании/компания/_search
{
    "query": {
        "nested": {
            "path": "работники",
            "query": {
                "bool": {
                    "must": [
                        { "match": { "работники.должность": "управляющий" }},
                        { "match": { "работники.должность":  "кассир" }} 
                    ]
                }
            }
        }
    }
}

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question