I
I
Igor2019-11-12 04:05:02
symfony
Igor, 2019-11-12 04:05:02

How to handle large amount of data in Symfony without memory leak?

Colleagues, welcome!
I have such trouble.
I need to process a large amount of data
About 5,000,000 rows in the database I
bind to categories
One-time task.
Memory leak!!
There is about 48 GB of memory on the server, I'm afraid this will not be enough.
Help improve, make leak less

class OrganizationToCategory extends Command
{

    /** @var EntityManager */
    private $entityManager;

    /**
     * OrganizationToCategory constructor.
     * @param ObjectManager $manager
     */
    public function __construct(ObjectManager $manager)
    {
        $this->entityManager = $manager;
        $manager->getConnection()->getConfiguration()->setSQLLogger(null);
        parent::__construct();
    }

    /**
     *
     */
    protected function configure()
    {
        $this->setName('app:organization-to-category');
    }

    /**
     * @param InputInterface $input
     * @param OutputInterface $output
     * @return int|void|null
     * @throws ORMException
     * @throws \Doctrine\Common\Persistence\Mapping\MappingException
     */
    protected function execute(InputInterface $input, OutputInterface $output)
    {


        $offset = 0;
        while (true) {

            if (!$this->entityManager->isOpen()) {
                $this->entityManager = $this->entityManager->create(
                    $this->entityManager->getConnection(),
                    $this->entityManager->getConfiguration()
                );
            }

            /** @var Organization $organization */
            $organization = $this->getOrganizationOne($offset);

            if (!is_object($organization)) {
                break;
            }

            $category_names = explode("|", $organization->getServicesStr());

            foreach ($category_names as $category_name) {


                $category_repository = $this->entityManager->getRepository("App:Category");

                /** @var ArrayCollection $categories */
                $categories = $category_repository->findBy([
                    "name" => $category_name
                ]);

                if (!is_array($categories)) {
                   continue;
                }

                /** @var Category $category */
                foreach ($categories as $category) {
                    if ($category_name === $category->getName()) {
                        $name = $organization->getName();

                        try {
                            $this->insert($organization->getId(), $category->getId());
                            $output->writeln(sprintf("<info>$name to $category_name memory_get_usage: %d</>", memory_get_usage(true)));
                        }catch (Exception $e) {
                            $output->writeln(sprintf("<error>Привязка уже состоялась!  memory_get_usage: %d</>", memory_get_usage(true)));

                            if (!$this->entityManager->isOpen()) {
                                $this->entityManager = $this->entityManager->create(
                                    $this->entityManager->getConnection(),
                                    $this->entityManager->getConfiguration()
                                );
                            }
                        }
                    }
                }
            }

            foreach ($categories as $category) {
                $this->entityManager->detach($category);
                unset($category);
            }

            $this->entityManager->detach($organization);
            $this->entityManager->clear();
            $this->entityManager->close();

            unset($categories);
            unset($category_names);
            unset($organization);
            unset($category_repository);

            $offset++;
        }


    }


    /**
     * @param int $offset
     * @return mixed
     */
    private function getOrganizationOne($offset = 0)
    {
        /** @var OrganizationRepository $organization_repository */
        $organization_repository = $this->entityManager->getRepository("App:Organization");
        return $organization_repository->get(0,0,0, $offset, 1)[0];
    }


    /**
     * @param int $organization_id
     * @param int $category_id
     * @throws DBALException
     */
    private function insert(int $organization_id, int $category_id)
    {
        $conn = $this->entityManager->getConnection();
        $sql = "
            INSERT INTO `organizations_categories` (`organization_id`, `category_id`) 
            VALUES (:organization_id, :category_id);
        ";

        $stmt = $conn->prepare($sql);
        $stmt->execute([
            "organization_id" => $organization_id,
            "category_id" => $category_id,
        ]);
    }
}

Growth dynamics
ЖЭК № 1 to Коммунальная служба memory_get_usage: 1940389888
Уктс to Коммунальная служба memory_get_usage: 1940389888
ЖЭУ № 7 to Коммунальная служба memory_get_usage: 1940389888
ЖКУ to Коммунальная служба memory_get_usage: 1940389888
Универсал to Коммунальная служба memory_get_usage: 1940389888
КПД to Коммунальная служба memory_get_usage: 1940389888
Пункт Приема Жилищно-Коммунальных Платежей МУП to Коммунальная служба memory_get_usage: 1940389888
Коммунальные платежи to Коммунальная служба memory_get_usage: 1942487040
Группа компаний ЖКХ Сервис to Коммунальная служба memory_get_usage: 1942487040
Группа компаний ЖКХ Сервис to Водосчетчики, газосчетчики, теплосчетчики memory_get_usage: 1942487040
СТ Теплотехник to Коммунальная служба memory_get_usage: 1942487040
Ужэк Домоуправ-НТ to Коммунальная служба memory_get_usage: 1942487040
ЖЭУ № 6 to Коммунальная служба memory_get_usage: 1942487040
Печатный Дом Крым to Широкоформатная печать memory_get_usage: 1942487040
Печатный Дом Крым to Рекламная продукция memory_get_usage: 1942487040
Печатный Дом Крым to Типография memory_get_usage: 1942487040
Типография Консул to Полиграфические услуги memory_get_usage: 1942487040
Типография Консул to Типография memory_get_usage: 1942487040
Константа to Издательские услуги memory_get_usage: 1942487040
Константа to Полиграфические услуги memory_get_usage: 1942487040
Константа to Типография memory_get_usage: 1942487040

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Andrey, 2019-11-12
@IgorPI

1. Use batch insets,
2. Unload data series as arrays, not model classes, and of fields. Unload exactly as much as you need.

I
index0h, 2019-11-12
@index0h

Check if debug mode is enabled, very similar. For inserts, if I were you, I would use the transaction for packs of 1k, for example. 5k times to recalculate indices - it's very sad. If you don’t have many categories, it makes sense to drag them all into memory, but you shouldn’t pull the muscle once again. There is no point in detach with clear / close, from the word at all.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question