Answer the question
In order to leave comments, you need to log in
What is Byte Bit and how does it work is not much about sound?
1 byte is 8 bits bits can take 1 or 0 (in the course of the wiki). In 1 byte, you can write the maximum number 255 or FF in hexadecimal number system. It seems like everything is clear, but if you think about it, I can’t even briefly and clearly describe the problem.
This hour I'm trying to work with images and sound at a low level (just by byte reading) with uncompressed formats (for now).
What I know about images is RGB from 0 to 255 which means there are 3 bytes per pixel, right?
What I know about sound (there is not a lot of dark forest for me) there are oscillations, for example, 10 Hz, that is, 10 oscillations per 1 s, and there is a different sound quality, so to speak, 8-16 bytes, as I understand it, let's take 8 bytes. That is, in 1 second of sound at 10 Hz, 80 bytes are obtained. Well, it seems to me. =)
That is, in order to convert fluctuations into a decimal number system, you need to split the file into 8 or 16 bytes and somehow output these numbers to the console in the future to build an oscillogram from them.
In general, I'm confused
How to understand what sound quality? That is, how many bytes to read at a time.
We got these 8 or 16 bytes and how to convert them to 10-decimal number system? That is, in the Hex editor, it displays byte by byte in hexadecimal system 8 bytes will be like FF FF FF FF FF FF FF FF for example, but how to understand what number is written immediately in 8 or 16 bytes? This hour I climbed up and looked at the Integer (data type) wiki and finally got confused. Isn't it a lot of 8 bytes to record sound =) the campaign is still in bits.
How to output this data to the console?
In general, where can you read about the bitwise or byte way of working with . Where they tell me what I want to know.
Fuh wrote complete nonsense, I apologize in advance, but there is no one to tell me. Put my brains back!
Everything is accepted from a kick in the ass in the form of a link to Google with a typed request =) and to oldschool books.
PS Thanks everyone.
Answer the question
In order to leave comments, you need to log in
Digital audio has 2 parameters: sample rate and bit rate. When encoding, the main task is to record the analog oscillation curve using columns. The sampling rate indicates the width of the bar (how many bars per second), and the bit rate indicates how many BITs encode the height of the bar. It is clear that the higher the sampling rate, the narrower the columns, and the higher the bitrate, the more accurately you can specify the height of the column. Therefore, with an increase in these two parameters, the discrete curve approaches the analog one.
What I know about images is RGB from 0 to 255 which means there are 3 bytes per pixel, right?
Regarding sound, I recently answered here, with links and examples: Working with C++ sound how?
concerning pictures it operates only for primitive formats of type bmp. there really to receive N pixels it is enough to read 3*N bytes. all other graphics formats (with rare exceptions) compress the image, very often even lossy (especially jpeg).
little experience to test it. take 2 pictures of the same size and save them in different formats. in bmp the size will be the same, in all others it will be different, sometimes very much.
if each pixel occupied exactly 3 bytes or any other but fixed number of bytes, then the size of images with the same resolution should be the same, because it is strictly proportional to the number of pixels, but this is mainly only for bmp.
with sound analog bmp is waw. it also stores information in proportion to time. here it is better to open the wiki at least and see how vaw stores the sound, what is the sampling rate. but records of the same time will have the same size.
bmp and vaw are simple formats for easy streaming writing and reading. but are rarely used due to their large size, so much more complex formats are used to keep the file size smaller. but they are quite difficult to read, without special out-of-the-box functions for reading them and converting them into a understandable set of 3 bytes for example.
About the image, everything is about the same. About the sound, it's not quite right.
As mentioned above, digital audio in PCM format (PCM) has 2 main characteristics: bit depth and sampling frequency.
The bit depth is found up to 32 bits (a bit and not a byte !!! up to 32 bits is up to 4 bytes). For example, in a CD - 16 bits (2 bytes) per sample, in telephony 8 bits (1 byte) per sample, modern music is encoded in 24 bits and sometimes in 32. The higher the bit depth, the more amplitude gradations can be reproduced (the amplitude is reproduced with greater accuracy). For example, in telephony 8 bits (256 values) and on CD 16 bits (65536 values).
The sample rate is how many of these samples per second are captured (stored/played back).
The higher the sampling frequency, the better the high frequencies are preserved. As you know, the maximum frequency that can be transmitted is equal to half the sampling frequency. For a CD, this is 44100Hz (i.e. the maximum sound frequency on an audio disc can be about 22kHz, and in a phone, the sampling rate is only 8000Hz and, for example, the sound of a violin, it will transmit very poorly, since frequencies above 4kHz will be cut off).
How to output data to the console depends on the container in which this data is stored (file format). In addition, it should also be taken into account that 16/24/32 bit numbers (and sound samples are numbers anyway) can be signed (positive and negative) or unsigned (positive only), integer or fractional (floating point, usually 32 bit sound like this) and also what byte order is used (the so-called endianness, for example, the number 65536 in the file can be stored as 01 00 00 or maybe as 00 00 01).
Those. the same bytes in a file can be interpreted differently depending on these parameters.
If you need to read data from a file and display it, read the documentation for the format you need.
Displaying as an oscillogram is not a tricky thing, you take as many samples from the file as you want to draw points horizontally and each sample (number) will indicate the deviation along the Y axis (zero will be at the bottom or in the middle, depending on the "sign" of the format). Keep in mind that the file can be stereo, then the samples of the two channels will be interleaved.
However, oscillograms are rarely used to analyze sound, it is much clearer to analyze its spectrum (for this you need to do a Fourier transform, google for the word FFT).
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question