How to separately transfer messages that do not fit into one package?

B

BUTURUM2021-11-18 14:02:34

Computer networks

BUTURUM, 2021-11-18 14:02:34

In our beautiful world, there is such a wonderful data transfer protocol as tcp / ip. It allows you to transfer information by splitting it into packets. Packets are good, but I still don't understand how TCP-based protocols manage to send whole messages, in several packets. Let's take the same HTTP, it is unlikely that even a GET request will be sent in one packet, but with some kind of overview, the server understands that all packets have been transmitted and you can start parsing the request, despite the fact that the connection is not interrupted. I wanted to write a protocol for my application (local messenger) that can transmit encrypted messages over a TCP / IP connection from both of its participants, preferably in parallel. Messages, as you understand, will not fit into one packet, so you need to somehow explain to the recipient that all packets of one message have been transmitted.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

R

Ronald McDonald, 2021-11-18
@Zoominger

Smoke the OSI model.
The data in TCP (level 4) is in no way related to what is in GET (level 7), he does not care at all.
And smoke the TCP protocol in more detail (read Kurose and Olifer), it simply and clearly explains how TCP understands how to control the flow, how to understand how many parts to split and how to understand that all the parts have come together.
Your protocol will work 100% on the application, layer 7 and TCP on the drum, encrypted data is transmitted there or not.

Y

Yaroslav, 2021-11-18
@yaror

And everything is simple)
When you work with the TCP protocol, you do not operate with the concept of "package": TCP hides from you the fact that the message is somehow broken into separately transmitted fragments during transmission.
Moreover, the level of encryption, for example, SSL / TLS, is implemented by most libraries transparently and imperceptibly for you, so your program may not know that at some point the data is encrypted on sending , and decrypted on reception .
The logic of sending or receiving something from a TCP connection is exactly the same as when working with a disk file: there is a certain stream of bytes that are not separated into separate messages at the protocol level. You need to come up with the sign "a new message has begun" yourself.
For example, TLV markup is popular: Tag-Length-Value
Tag: message type or some marker to start a new message. Typically, this is a fixed length field, such as 1 byte.
Length: The length of the message. Fixed length field, eg 4 bytes.
Value: The message itself is of variable length.
It is up to you to decide which value to pass to Length.
You can do this:
Length = field_length_in_bytes(Value)
Or you can do this:
Length = field_length_in_bytes(Tag) + field_length_in_bytes(Length) + field_length_in_bytes(Value)
There is no need to be afraid that, having started reading from a TCP connection, you will start reading it "from the middle": the protocol guarantees that when you read, you will see the bytes in the same order in which they were sent to you from that side.
However, attention, a rake that beginners often step on:
1. After you have read a byte from the connection, it disappears: you will not be able to read it again. At the next reading procedure, you will read the next byte sent immediately after you read it last time.
2. Be prepared for the fact that you can start reading some message at the moment when its sending from the other side has already begun, but has not yet been completed.
But, you can at any time see how many bytes have already arrived and accumulated in the receive buffer.
Example: Let's say you are expecting a 100 byte message.
It is possible that you have read 50 bytes from the connection and then you receive a "no more data" notification. This means that the remaining 50 bytes are still somewhere along the way; You should periodically repeat the procedure for reading from the connection until you have received, probably in several steps, the remaining data.
It turns out that in pseudocode, although not optimal, but the simplest logic for reading each new message in the TLV markup can look like this:

пока (соединение_не_закрыто):
    подождать_пока_в_буфере_приёма_не_окажется_байт(1)
    tag = прочитать_байт(1)

    подождать_пока_в_буфере_приёма_не_окажется_байт(4)
    length = прочитать_байт(4)

    подождать_пока_в_буфере_приёма_не_окажется_байт(length)
    value = прочитать_байт(length)

    обработать_новое_сообщение(tag, value)

Such logic will allow you to normally handle the situation when someone sent you two messages in a row, but immediately the first message reached you in its entirety, and from the second - only the first half: so much fit into one TCP packet. The second half of the second message in the next TCP packet can arrive even after half an hour.

R

res2001, 2021-11-18
@res2001

TCP/IP is a suite of protocols, not just TCP.
Specifically, TCP is a connection-oriented streaming protocol.
Streaming means that the data for the higher layer that TCP uses is not sent as individual packets, but as a potentially dimensionless stream of bytes.
So you can easily drive gigabytes over TCP without bothering to split them into packets - TCP itself will sort everything into separate packets, and on the receiving side it will put all the packets into a stream.
At the same time, the byte order is preserved, and if something is lost during transmission, it will be repeated again.
A protocol that is focused on the transmission of individual messages (packets) - UDP. It is also part of the TCP/IP protocol suite.