Delphi-PRAXiS - DELPHI & TInifiles ASCII und UTF16

Delphi-PRAXiS (https://www.delphipraxis.net/forum.php)

- Programmieren allgemein (https://www.delphipraxis.net/40-programmieren-allgemein/)

- - DELPHI & TInifiles ASCII und UTF16 (https://www.delphipraxis.net/215450-delphi-tinifiles-ascii-und-utf16.html)

DELPHI & TInifiles ASCII und UTF16

Delphi und TIni file kann ja nur ASCII und UTF 16 ohne BOM (

https://www.delphipraxis.net/208415-...-vs-ascii.html ) . Wir nutzen noch oft *.ini files. Wenn unsere Kunden mit anderen Editoren die *.ini file bearbeiten kommt es vor , daß das encoding verändert wird.
Aktuell habe ich diese 2 Funktionen angedacht :

checken or im Richtigen format, falls nein die DAtei im richtigen encosing neu schreiben.
Gibt es hierfür bessere Lösungsansätze ?

Delphi-Quellcode:

			
Detection ASCII UTF-16:

function IsAsciiOrUtf16WithoutBOM(const FileName: string): Boolean;

var

  FileStream: TFileStream;

  Buffer: TBytes;

  i: Integer;

  NumRead: Integer;

begin

  Result := True; // Assume the file is ASCII or UTF-16 without BOM

  FileStream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyNone);

  try

    SetLength(Buffer, FileStream.Size);

    NumRead := FileStream.Read(Buffer[0], FileStream.Size);

    // Check the first few bytes for BOMs

    if NumRead >= 2 then

    begin

      if (Buffer[0] = $FF) and (Buffer[1] = $FE) then

        Exit(False); // UTF-16 Little Endian BOM

      if (Buffer[0] = $FE) and (Buffer[1] = $FF) then

        Exit(False); // UTF-16 Big Endian BOM

    end;

    // Check for non-ASCII characters, which could disqualify the file as ASCII

    for i := 0 to High(Buffer) do

    begin

      if Buffer[i] > 127 then

      begin

        Exit(False); // Non-ASCII character found

      end;

    end;

  finally

    FileStream.Free;

  end;

end;

Conversion to UTF-16:

function Convert2UTF16(const FileName: string): Boolean;

var

  FileContent: TBytes;

  Encoding: TEncoding;

begin

  Result := False; // Default to False

  try

    // Read the file content into a byte array

    FileContent := TFile.ReadAllBytes(FileName);

    // Check if the file begins with the UTF-16 Byte-Order Mark (BOM)

    if (Length(FileContent) >= 2) then

    begin

      if (FileContent[0] = $FF) and (FileContent[1] = $FE) or

         (FileContent[0] = $FE) and (FileContent[1] = $FF) then

      begin

        Result := True; // It's UTF-16

        Exit; // No need to convert

      end;

    end;

    // Convert to UTF-16

    Encoding := TEncoding.Unicode; // Default to little endian UTF-16

    FileContent := TEncoding.Convert(TEncoding.Default, Encoding, FileContent);

    // Save the UTF-16 content back to the file, including BOM

    TFile.WriteAllBytes(FileName, Encoding.GetPreamble + FileContent);

    Result := True; // Successfully converted to UTF-16

  except

    // Handle exceptions (e.g., file not found, access denied)

    // You can customize the error handling as needed

  end;

end;

AW: DELPHI & TInifiles ASCII und UTF16

Hast du Mal TMemIniFile probiert?

AW: DELPHI & TInifiles ASCII und UTF16

Eine Heuristik zur Erkennung der Codierung kann nie 100%ig funktionieren - und nichts anderes versuchst du hier ja. Besser wäre es imho, für die Ini-Datei ein klar definiertes Format anzugeben. Sinnvoll wäre imho UTF-8 und die Verwendung von TMemInifile im Code.

Dann bringst du glaube ich auch ANSI und ASCII durcheinander. Die TIniFiles unterstützen afaik durchaus mehr als ASCII - Umlaute sind z.B. kein Problem, solange die Ini-Datei nicht auch auf anderen Systemen mit anderen Ländereinstellungen gelesen wird. Deine Funktion IsAsciiOrUtf16WithoutBOM macht dann auch nicht das, was sie soll, denn mit

Delphi-Quellcode:

			if Buffer[i] > 127 then

begin

  Exit(False); // Non-ASCII character found

end;

wird z.B. bei so einer Ini (die als UTF-16 ohne BOM gespeichert wird)

Code:

			[Greetings]

value1=Hallöle

trotzdem false zurückgeliefert, weil das "ö" als $F6 $00 kodiert wird - und $F6 ist größer als 127.

AW: DELPHI & TInifiles ASCII und UTF16

Ein UTF-8 mit BOM hat, selbst wenn es nur ASCII-Zeichen im Text gibt, dennoch das BOM als Unicodezeichen #$FEFF zu Beginn.
Bei UTF-16 jeweils entsprechend der ByteReihenfolge, aber auch die 3 BOM-Bytes des UTF-8 sind eigentlich dieses #$FEFF (nur als UTF-8 kodiert).