java獲取一個(gè)文本文件的編碼(格式)信息

更新時(shí)間：2022年09月22日 10:58:13 作者：java未來王者

這篇文章主要介紹了java獲取一個(gè)文本文件的編碼(格式)信息，文章圍繞主題展開詳細(xì)的內(nèi)容介紹，具有一定的參考價(jià)值，需要的小伙伴可以參考一下

前言：

文本文件是我們?cè)趙indows平臺(tái)下常用的一種文件格式，

這種格式會(huì)隨著操作系統(tǒng)的語(yǔ)言不同，而出現(xiàn)其默認(rèn)的編碼不同

那么如何使用程序獲取“文本文件”的編碼方式呢？

文件編碼的格式?jīng)Q定了文件可存儲(chǔ)的字符類型，所以得到文件的類型至關(guān)重要

下文筆者講述獲取一個(gè)文本文件的格式信息的方法分享，如下所示:

現(xiàn)思路:

通過獲取文件流的前3個(gè)字節(jié)
判斷其值的方式，即可獲取文本文件的編碼方式

例:

package com.java265.other;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
public class Test {
    /*
     * java265.com 獲取文本文件的編碼方式
     *  
     **/
    public static void main(String[] args)   {
      File file = new File("E://person/java265.com/java.txt");
      System.out.println(GetEncoding(file));
      }
    public static String GetEncoding(File file)
    {
        String charset = "GBK";
        byte[] first3Bytes = new byte[3];
        try {
            boolean checked = false; 
            InputStream is = new FileInputStream(file);
            int read = is.read(first3Bytes, 0, 3);
 
            if (read == -1)
                return charset;
            if (first3Bytes[0] == (byte) 0xFF && first3Bytes[1] == (byte) 0xFE) {
                charset = "UTF-16LE";
                checked = true;
            } else if (first3Bytes[0] == (byte) 0xFE
                    && first3Bytes[1] == (byte) 0xFF) {
                charset = "UTF-16BE";
                checked = true;
            } else if (first3Bytes[0] == (byte) 0xEF
                    && first3Bytes[1] == (byte) 0xBB
                    && first3Bytes[2] == (byte) 0xBF) {
                charset = "UTF-8";
                checked = true;
            }else if (first3Bytes[0] == (byte) 0xA
                    && first3Bytes[1] == (byte) 0x5B
                    && first3Bytes[2] == (byte) 0x30) {
                charset = "UTF-8";
                checked = true;
            }else if (first3Bytes[0] == (byte) 0xD
                    && first3Bytes[1] == (byte) 0xA
                    && first3Bytes[2] == (byte) 0x5B) {
                charset = "GBK";
                checked = true;
            }else if (first3Bytes[0] == (byte) 0x5B
                    && first3Bytes[1] == (byte) 0x54
                    && first3Bytes[2] == (byte) 0x49) {
                charset = "windows-1251";
                checked = true;
            }
            //bis.reset();
            InputStream istmp = new FileInputStream(file);
            if (!checked) {
                int loc = 0;
                while ((read = istmp.read()) != -1) {
                    loc++;
                    if (read >= 0xF0)
                        break;
                    if (0x80 <= read && read <= 0xBF)
                        break;
                    if (0xC0 <= read && read <= 0xDF) {
                        read = istmp.read();
                        if (0x80 <= read && read <= 0xBF)
                            continue;
                        else
                            break;
                    } else if (0xE0 <= read && read <= 0xEF) {
                        read = istmp.read();
                        if (0x80 <= read && read <= 0xBF) {
                            read = istmp.read();
                            if (0x80 <= read && read <= 0xBF) {
                                charset = "UTF-8";
                                break;
                            } else
                                break;
                        } else
                            break;
                    }
                }
            }
            is.close();
            istmp.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return charset; 
    }
}

到此這篇關(guān)于java獲取一個(gè)文本文件的編碼(格式)信息的文章就介紹到這了,更多相關(guān)java獲取文本編碼內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: