A
A
Andrey2018-12-20 03:58:31
PHP
Andrey, 2018-12-20 03:58:31

Why does Elasticsearch aggregation return strange results?

There is this code:

<?php
use Elasticsearch\ClientBuilder;

require_once $_SERVER["DOCUMENT_ROOT"] . '/inc/connect.php';
require 'vendor/autoload.php';

$client = ClientBuilder::create()->build();

$brands = ['Tommy Hilfiger', 'Tommy Jeans', 'Tommy Hilfiger', 'Tommy Jeans'];
foreach ($brands as $id => $brand) {
  $params = [
    'index' => 'my_index',
    'type' => 'my_type',
    'id' => $id,
    'body' => [
      'brand'	=> $brand
    ]
  ];

  $response = $client->index($params);
}

$params = [
    'index' => 'my_index',
    'type' => 'my_type',
    'body' => [
        'query' => [
            'bool' => [
                'must' => [
                    [ 'match' => [ 'brand' => 'tommy' ] ]
                ]
            ]
        ],
        'aggs' => [
            'brand' => [
                'terms'	=> [
                    'field' => 'brand',
                ]
            ]
        ]
    ]
];
echo '<pre>';
$response = $client->search($params);
print_r($response);

It returns the following result
Array
(
    [took] => 7
    [timed_out] => 
    [_shards] => Array
        (
            [total] => 5
            [successful] => 5
            [failed] => 0
        )

    [hits] => Array
        (
            [total] => 4
            [max_score] => 0.19178301
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => my_index
                            [_type] => my_type
                            [_id] => 0
                            [_score] => 0.19178301
                            [_source] => Array
                                (
                                    [id] => 0
                                    [brand] => Tommy Hilfiger
                                )

                        )

                    [1] => Array
                        (
                            [_index] => my_index
                            [_type] => my_type
                            [_id] => 2
                            [_score] => 0.19178301
                            [_source] => Array
                                (
                                    [id] => 2
                                    [brand] => Tommy Hilfiger
                                )

                        )

                    [2] => Array
                        (
                            [_index] => my_index
                            [_type] => my_type
                            [_id] => 1
                            [_score] => 0.19178301
                            [_source] => Array
                                (
                                    [id] => 1
                                    [brand] => Tommy Jeans
                                )

                        )

                    [3] => Array
                        (
                            [_index] => my_index
                            [_type] => my_type
                            [_id] => 3
                            [_score] => 0.19178301
                            [_source] => Array
                                (
                                    [id] => 3
                                    [brand] => Tommy Jeans
                                )

                        )

                )

        )

    [aggregations] => Array
        (
            [brand] => Array
                (
                    [doc_count_error_upper_bound] => 0
                    [sum_other_doc_count] => 0
                    [buckets] => Array
                        (
                            [0] => Array
                                (
                                    [key] => tommy
                                    [doc_count] => 4
                                )

                            [1] => Array
                                (
                                    [key] => hilfiger
                                    [doc_count] => 2
                                )

                            [2] => Array
                                (
                                    [key] => jeans
                                    [doc_count] => 2
                                )

                        )

                )

        )

)

I can't figure out why it returns in [buckets] tommy - 4, hilfiger - 2 and jeans - 2, although in theory it should give me tommy hilfiger - 2 and tommy jeans - 2

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Andrey Schultz, 2018-12-21
Shults @noroots

Figured it out myself in the end. Here is a working example:

use Elasticsearch\ClientBuilder;

require 'vendor/autoload.php';

$client = ClientBuilder::create()->build();

$params = [
    'index' => 'my_index',
    'body' => [
        'mappings' => [
            'my_type' => [
                'properties' => [
                    'brand' => [
                        'type'  => 'string',
                    ],
                    'raw-brand' => [
                        'type'  => 'string',
                        'index' => 'not_analyzed'
                    ],
                    'color' => [
                        'type'  => 'string',
                    ],
                    'raw-color' => [
                        'type'  => 'string',
                        'index' => 'not_analyzed'
                    ],
                    'id' => [
                        'type'  => 'integer',
                    ]   
                ]
            ]
        ]
    ]
];

$client->indices()->create($params);

$items = [
    [
        'id'        => 1,
        'category'  => 'Jackets',
        'brand'     => 'Tommy Hilfiger',
        'color'     => 'Red'
    ],
    [
        'id'        => 2,
        'category'  => 'Jeans',
        'brand'     => 'Tommy Jeans',
        'color'     => 'Navy'
    ],
    [
        'id'        => 3,
        'category'  => 'Shirts',
        'brand'     => 'Tommy Hilfiger',
        'color'     => 'Maroon'
    ],
    [
        'id'        => 4,
        'category'  => 'Trousers',
        'brand'     => 'Tommy Jeans',
        'color'     => 'Grey'
    ],
    [
        'id'        => 5,
        'category'  => 'Shirts',
        'brand'     => 'Tommy Hilfiger',
        'color'     => 'Grey'
    ],
    [
        'id'        => 6,
        'category'  => 'Sneakers',
        'brand'     => 'Tommy Jeans',
        'color'     => 'Grey'
    ],
    [
        'id'        => 7,
        'category'  => 'Sneakers',
        'brand'     => 'Tommy Jeans',
        'color'     => 'Grey'
    ]
];

foreach ($items as $item) {
    $params = [
        'index' => 'my_index',
        'type' => 'my_type',
        'id' => $item['id'],
        'body' => [
            'brand'         => $item['brand'],
            'raw-brand'     => strtolower(str_replace(' ', '', $item['brand'])),
            'color'         => $item['color'],
            'raw-color'     => strtolower(str_replace(' ', '', $item['color'])),
            'category'      => $item['category'],
            'raw-category'  => strtolower(str_replace(' ', '', $item['category']))
        ]
    ];

    $client->index($params);
}

$params = [
    'index' => 'my_index',
    'body' => [
        'query' => [
            'bool' => [
                'must' => [
                    [ 'match' => [ 'brand' => 'tommy' ] ],
                    [ 'match' => [ 'color' => 'grey' ] ]
                ]
            ]
        ],
        'aggs' => [
            'brands' => [
                'terms' => [
                    'field' => 'raw-brand',
                ],
            ],
            'colors' => [
                'terms' => [
                    'field' => 'raw-color',
                ]
            ],
            'categories' => [
                'terms' => [
                    'field' => 'raw-category',
                ]
            ]
        ]
    ]
];

$response = $client->search($params);
echo '<pre>';
print_r($response);

G
grinat, 2018-12-20
@grinat

Because all four have the word tommy.

H
hOtRush, 2018-12-20
@hOtRush

Because fields for field aggregations must be index => not_analyzed, otherwise they will break.
https://www.elastic.co/guide/en/elasticsearch/guid...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question