E
E
espantos2016-03-14 12:56:40
Java
espantos, 2016-03-14 12:56:40

Remove all spaces, tabs, symbols, etc. from Java String?

I am reading text into a string. I want to remove absolutely everything except letters. Spaces, line breaks, paragraphs, characters, etc. for the subsequent counting of bigrams. I'm trying to make it a regular expression. I don't get anything

FileInputStream inFile = new
        FileInputStream("c:\\bukovski.txt");
    byte[] str = new byte [inFile.available()];
    inFile.read(str);
    String text = new String(str);
    //String textWithoutspaces = new String();
    //text = FilterText.filterWithSpaces(text);
    String textWithoutSpaces = text.toLowerCase().replaceAll("//s+", "");
    System.out.println(textWithoutSpaces);

For starters, this expression doesn't remove newlines in some cases Returns nothing
replaceAll("[^а-я]+", "");

Answer the question

In order to leave comments, you need to log in

3 answer(s)
E
Evhen, 2016-03-14
@espantos

import java.io.*;

public class CharCleaner {

  public static void main(String[] args) throws IOException {


    try (
        Reader reader = new BufferedReader(new FileReader(new File("sourceFile.txt")));
        Writer writer = new BufferedWriter(new FileWriter(new File("resultFile.txt")))
    ) {

      int ch;

      while ((ch = reader.read()) != -1) {
        if (Character.isAlphabetic(ch)) {
          writer.write(ch);
        }
      }


      writer.flush();
    } catch (IOException e) {
      e.printStackTrace();
    }

  }

}

Does not load the entire file into memory, there is buffering when reading and writing a file, does not create a bunch of String objects using replace* methods

T
Therapyx, 2016-03-14
@Therapyx

I recently wrote to replace strings in a text file with 001, 002, 003, 004 .... 500 xD remade a bit, where the strings are tabs, and newline, just add more of your options and insert an additional line for this string in the magnifying glass. Those. just make all the options you need. In this example, I only made spaces and newlines.

import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Replace {

  public static void main(String[] args) throws IOException{	
    
    LineNumberReader  lnr = new LineNumberReader(new FileReader(new File("C:/test.txt")));
    lnr.skip(Long.MAX_VALUE);
    System.out.println(lnr.getLineNumber() + 1 + " summary rows"); 
    lnr.close();
      
    Path path = Paths.get("C:/test.txt");
    Charset charset = StandardCharsets.UTF_8;
    
    String content = new String(Files.readAllBytes(path), charset);
    
    String tab = " ";
    String newLine = "\n";
    for (int i = 0; i < lnr.getLineNumber() + 2; i++) {					
      content = content.replaceAll(tab, "");
      content = content.replaceAll(newLine, "");
      Files.write(path, content.getBytes(charset));
    }	
  }
}

M
MamOn, 2016-03-14
@MamOn

You have slashes in the regexp in the wrong direction, it should be like this: replaceAll("\\s+", "")

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question