Answer the question
In order to leave comments, you need to log in
How to defeat UTF-8 encoding in Java?
There is a task:
Given: a file that contains UTF-8 characters. Symbols are arbitrary.
Required: get a string of a given length from a given location in the file and display it in the console.
The problem arose that using the RandomAccessFile class I get a set of bytes, and after converting to a string I get 1 extra character (depending on whether I captured a space or not).
Can you please tell me how to properly decode from an array of bytes into a string in UTF-8?
An example of a line in a file: Thank you for being you
Code:
public class Main {
public static final int CHARS_PER_PAGE = 19;
public static void main(String[] args) {
System.out.println(getPage("test.txt", 0));
}
public static String getPage(String filePath, int pageNum) throws IOException {
int startPos = CHARS_PER_PAGE * pageNum;
byte[] pageBytes = new byte[CHARS_PER_PAGE];
RandomAccessFile raf = new RandomAccessFile(filePath, "r");
raf.seek(startPos);
raf.read(pageBytes, 0, CHARS_PER_PAGE);
System.out.println("Bytes Array: " + Arrays.toString(pageBytes));
System.out.println("Result String: " + new String(pageBytes, StandardCharsets.UTF_8));
raf.close();
return new String(pageBytes, StandardCharsets.UTF_8);
}
}
Answer the question
In order to leave comments, you need to log in
How to read a UTF-8 string from a file in general:
https://dzone.com/articles/read-utf-8-file-java
BufferedReader in = new BufferedReader(new FileReader("file"));
while( (s = in.readLine()) != null) {
String UTF8Str = new String(s.getBytes(),"UTF-8"));
}
Required: Get a string of given length from a given location in a fileThe fact is that when encoding text in UTF-8, each arbitrary character from the Unicode table can be encoded with a previously unknown number of octets. For Cyrillic, there are 2 octets for each character, if I'm not mistaken.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question