V
V
Vladimir Chernyshev2010-11-26 22:44:40
PHP
Vladimir Chernyshev, 2010-11-26 22:44:40

Chat in PHP: database bottleneck - how to solve?

There is a task to organize a simple chat with a web interface and a full history on an existing site on a self-written engine (PHP5.3.3/MySQL5.1). Googling on existing solutions did not give anything good, either redundantly, or produces a feeling of "knee-knee crafts" and most often has not been supported for a long time, and I would like to have one architecture and coding style. In general, the decision was made to implement independently. There are no special problems with coding, the prototype was implemented, but load testing with different versions of indexes and tables showed that with already ~ 20 “reader” hosts and one “writer” per second, MySQL shuts up (VDS with 1GB of RAM, half is given to the muscle, and 2GHz percent, nginx + php-frpm under Debian) even on a denormalized table, since requests cannot be cached using database tools (each “reader” has its own filters, because private, filtering in the server application is unlikely to be more efficient than in the database, as it seems to me, but the client is unacceptable). And I would like to keep at least 40-50 on this hardware in addition to the main load. What can help? There is no “highload” experience, the following ideas arose:

- write a daemon for a chat on a subdomain so that it reads in the main thread from the database only at startup (the last N messages) or rare specific requests, stores them in the process memory (killing the old ones), and writes only “logs” to the database for the next start (then filtering will be effective, IMHO, plus it can be carried out ahead of time and incrementally, storing the messages themselves in a single pool, and for each reader add to the list of links to "his" messages when a message is received from the "writer" personally for him or public , and remove them from there when reading)

- use the memcache in the same way (although so far I can hardly imagine how to ensure integrity, before that I only worked with file caches that do not “evaporate” themselves) for a regular PHP handler (that is, so that a bunch of workers have access to a common pool of messages and incremental personal lists of links to them between requests)

- translate chat to NoSQL DBMS (what? The main task is effective filtering by a pair of fields of the last records, such as WHERE timestamp > {last_time} (or id> {last_id}) AND (recipient_id IS NULL OR recipient_id={ user_id}) ORDER BY timestamp (or id) DESC LIMIT {max_records} )

What is worth trying or what other options might there be? I don't want to write a demon, as it will complicate the administration of both the server and the chat itself (an analogue of IRC commands to do?), There is practically no experience with the cache and NoSQL.

Answer the question

In order to leave comments, you need to log in

15 answer(s)
E
eternals, 2010-11-27
@eternals

First, it is necessary to clarify what kind of VDS. If the masterhost, then immediately warn, because. this is a separate song.
Secondly, you need to understand why mySQL is stupid. Above correctly wrote about the lock at the time of recording.
Also, the disk may not be able to keep up (due to VDS restrictions).
I would try to translate innodb into the database and set commit to 0 (reset once a second to disk).
Further there is such piece as representations. And they have an in-memory cache mode. And to do SQL requests already to it. In addition, they can be spawned for different occasions.
It would also be nice not to forget about the memory tables. Let's say, write also to non-memory, but read massively only from it.
In passing it is worth remembering about indexes. Their absence makes select long, and excessive presence makes insert long. And the indexes themselves are usually done incorrectly by beginners.
I would also like to return to the VDS settings, because they cut the average load by processor, memory and disk operations. It can hurt you stupidly. And at the same time I also remembered the size of the database in memory. If there is already a lot of data, and the caches are not large, then it will cut across the disk.

V
Vlad Zhivotnev, 2010-11-27
@inkvizitor68sl

what kind of VDS is that, really? If OpenVZ, then you will never receive the resources you specified.

K
kolesnevg, 2010-11-27
@kolesnevg

Tips from myself
1) try to implement the server on node.js (one process like 800 connections keeps per second)
2) heavy requests, as far as I understand, in the case when there are 30 people in the chat, one of them writes, you save it in the database, then 29 people send a request to “update” the chat ... and 29 selections from the database are made, in my opinion this is not entirely correct, try it a little differently: when one user sends a message, of course you save it once for "history", but at the same time create an update queue for all "interested" users in the same memcache, you do not need to store one message per everyone, you need to store a complete list of what he is interested in for everyone, and of course, when 29 people ask for updates, you will not contact the database at all, but simply take the data from the cache (and clean it naturally)

@
@resurection, 2010-12-08
_

I remember 10-15 years ago chats were very popular. In addition to chats and forums, nothing else has been invented. No blogs, no social networks, no twitters, not even ajax. And thousands of people hung out in these chats (it was possible to open several channels). Or a hundred people chatting without channels.
Chats were on frames and every 5 seconds. the frame with the list of messages was updated FULLY (all HTML with the entire batch of messages).
Iron in those days was something like 300-400 MHz. Your VDS, I'm sure, would break those servers.
It is somehow strange now in 2010 to hear that the chat cannot withstand more than 30 simultaneous users.
Have you tried on a different server? At least on the locale, how much does it hold?

A
Alexey Ashurok, 2010-11-27
@AotD

You can write a daemon. Regular PHP daemon.
nanoserv or phpDaemon (the information about the latter slipped on Habré, but it is for real kung fu pandas :)).
Regular asynchronous socket connections, why do you need a DB? Why do you need such selections? Do you want to keep chat history?
If only public-private messages - elementary when a message arrives, the daemon either gives it to a specific socket, or brute-casts it to everyone. No "requests once per second".
On the client side, a flash drive that holds this same socket and receives / sends messages.

V
Vladson, 2010-11-27
@Vladson

There was somehow a task to rewrite the chat so that there were minimal brakes (the server was already overloaded), I used APC (I threw only sent messages “for history” into the database), in principle, I can advise a similar approach ...

S
slang, 2010-11-27
@slang

Yes, it immediately asks for key-value NoSQL storage + polling, so that after fixing the plugging of the database, the scripts on the web server do not get plugged.

V
Vyacheslav Plisko, 2010-11-29
@AmdY

write a demon, the algorithm is very simple:
1. a script to which messages are sent. not a demon. you put the message in the database and, if desired, in the memcache so that it is available to the demon. I did without memcache.
2. the daemon works and in a cycle overwrites the post.js file in which the last N messages are stored in json. At a certain point in time, you can clean the table and remove old posts altogether or to another table.
3. from the site ajax-ok take the file post.js?r=random_number and update the message panel. because js is taken, php is not raised, even apache can withstand such a load.
you can not use the database at all, leaving only memcache or using the shared memory php.net/shmop
If you need to share posts for users, then generate not one post.js file, but several /{user_key}/post.js and give the secret key to users.

S
standov, 2010-11-26
@standov

well, if you want exactly the database, then I would think in the direction of the intermediate layer - i.e. data from it to the database is written asynchronously with a certain filling, and reading only from it, from the database, data is poured into it only in the event of a total crash, P.S. these are thoughts no more, the topic of "the impossibility of caching requests" did not master

A
Alexey Zhurbitsky, 2010-11-26
@blo

I would like to know the average number of queries to the database at the time mysql is plugged. maybe a problem with the frontend? how is the data loaded? by chance not a periodic survey?

G
gonzazoid, 2010-11-27
@gonzazoid

as soon as you ask yourself what problems using sql solves, life will become much easier. checked.
Specifically, in your case, I would still recommend looking towards the perl daemon. There are ready-made developments, you only need a file.

W
Wott, 2010-11-27
@Wott

It is not entirely clear why the base at all. As far as I understand, it is used to transfer data between request processes (requests are also not needed - use websocket ) and for this you either need to write a separate daemon, or use a server with real fastcgi for php or fastcgi in any other language where the data will be shared. Well, it's correct to sort them out with semaphores.
Another option is a memcache, but you need to make a data structure in it so that it would be convenient to add messages and collect a response, but you don’t have to think about garbage collection.
Maybe the base is needed for the story ... although who needs it in chats? :) You can do it offline with a cron.
To start the server - if everything is done correctly, then it will not be necessary, since clients have a current situation, you just need to notify them of changes. And they are not needed at the start.

S
shagguboy, 2010-11-27
@shagguboy

try INNODB tables.
1) they are able to keep in memory not only indexes but also data.
2) it is not blocked when writing entirely
, let's EXPLAIN requests here.

E
Evgeny Bezymyannikov, 2010-11-28
@psman

<?php
# Connect to memcache:
global $memcache;
$memcache = new memcache;
# Gets key / value pair into memcache… called by mysql_query_cache()
function getCache($key) {
global $memcache;
return($memcache)? $memcache->get($key): false;
}
# Puts key / value pair into memcache… called by mysql_query_cache()
function setCache($key,$object,$timeout = 60) {
global $memcache;
return($memcache)? $memcache->set($key,$object,MEMCACHE_COMPRESSED,$timeout): false;
}
# Caching version of mysql_query()
function mysql_query_cache($sql,$linkIdentifier = false,$timeout = 60) {
if (($cache = getCache(md5("mysql_query". $sql))) !== false) {
$cache = false;
$r = ($linkIdentifier !== false)? mysql_query($sql,$linkIdentifier): mysql_query($sql);
if (is_resource($r) && (($rows = mysql_num_rows($r)) !== 0)) {
for ($i=0;$i<$rows;$i++) {
$fields = mysql_num_fields($r );
$row = mysql_fetch_array($r);
for ($j=0;$j<$fields;$j++) {
if ($i === 0) {
$columns[$j] ​​= mysql_field_name($r,$j);
}
$cache[$i][$columns[$j]] = $row[$j];
}
}
if (!setCache(md5("mysql_query". $sql),$cache,$timeout)) {
here we stir up a query to the database and return the answer
}
}
}
return $cache;
}
?>

V
Vasya_Sh, 2010-11-28
@Vasya_Sh

It is necessary to write it down in memcache, dump the info into the database using the cron (once every 5 minutes, for example). Those. keep all relevant data in the memcache.
Even better - install REDIS, chats with it are a pleasure.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question