VARCHAR, VARCHAR(MAX), TEXT: Gibt es guten OnDisk-Space Vergleich verschiedener DB ?

**Rollo62**

Hallo zusammen,

es geht um die Anlage eines neuen Projektes, was eventuell mehrere Datenbanken unterstützen muss.
Ich werde für die Implementierung FireDAC benutzen, und hoffe dass dort schon ein paar Hilfen zum Abfangen von
Multi-DB Projekten eingebaut sind.
Ich habe aber bisher nur jeweils mit einzelnen DB gearbeitet (Sqlite, MySql, Firebird), und musste SQL nicht DB-übergreifend denken.
Mein Ziel wäre möglichst mit dem gleichen Satz SQL auf allen DB effizient zu arbeiten, also nicht 4 verschiedenen SQL-Varianten pflegen zu müssen.
(Da kann FireDAC sicher helfen, und SQL abstrahieren).
Insbesondere das Text speichern, normalerweise bis 255-512 aber teilweise auch viel größerer Textblöcke (BLOB) ist angedacht.

Ich frage mich
ob es einen guten Vergleich der Text Datentypen zum Unicode-String-Speichern der verschiedenen DB's im Web gibt,
insbesondere Speicherverbrauch, Performance Suchen/Indizieren, bei variabler Textlänge.
Welche Strategie / Datentype wäre da am Besten, um möglchst kompatibl und effizient zu bleiben ?

Es geht mir um folgende Punkte
- Texte mit variabler Länge ( die eventuell auch mal eine zu klein vorgegebene Länge sprengen könnten ).
- Z.B.: Namen: VARCHAR(80) - Flummi, Brummi, Zummi //<= Wie groß ist der OnDisk-Speicher versch. DB ?
- Z.B.: Namen: VARCHAR(80) - TextMitMehrAls80Zeichen => Könnte auch mal 90 Zeichen kommen, dann kracht es.
- Wieviel realer Speicher wird in den verschiedenen DB wirklich belegt ?
- Speichern von UniCode (UTF8) Texten muss möglich sein.
- Möglichst per Volltextsuche durchsuchbar, sollte einigermaßen effizient sein.
- Möglichst indizierbar (geht das überhaupt bei TextBlobs, oder muss ich selbst von außen Hashes verwalten ) ?
- Wie kann ich große TEXT Blobs effizient per SQL durchsuchen und in Queries einbinden ?
- Macht es Sinn große TEXT Blobs in separate Tabellen, mit HASH Columns, auszulagern, oder machen die modernen DB sowas automatisch ?
- Gibt es vielleicht sogar unterstützende Funktionen in FireDAC genau für solche Fälle ?

Ist also (Worst-Case) VARCHAR(N) oder VARCHAR(MAX) oder TEXT besser ?
- ich denke VARCHAR(N) N=255, als Angabe einer Worst-Case Größe die niemals überschritten wird, ist die bessere Strategie als VARCHAR(MAX)
- Aber speichern manche DB dann immer 255 Zeichen ab, auch wenn der durchschnittliche Text ca. 40 Zeichen ist ? Was nicht gut wäre.
- TEXT wird bei manchen DB wohl separat gespeichert, ist dann effizient gespeichert aber womöglich nicht performant beim Suchen.

Ich hoffe es gibt gute Vergleichsartikel oder Tipps zu dem Thema, in dem alle Datenbanken mal verglichen werden.

Ich hoffe ein paar DB-Experten können mich hier etwas aufklären.

Als Anregungen mal ein paar Infos, siehe unten:

Zitat:

Database string types comparison: VARCHAR(), VARCHAR(n), TEXT
=> here 20.01.22 varchar MySql
VARCHAR:
=> As the name suggests, varchar means character data that is varying.
# Also known as Variable Character, it is an indeterminate length string data type.
# It can hold numbers, letters and special characters.
# MsSql 2008+ can store up to 8000 characters as the maximum length of the string using varchar data type.
# SQL varchar usually holds 1 byte per character and 2 more bytes for the length information.
# It is recommended to use varchar as the data type when columns have variable length and
the actual data is way less than the given capacity.
VARCHAR(max):
=> How SQL varchar(max) is different from varchar(n)?
# There are times where SQL developers (including myself) usually define varchar datatype without a length,
and subsequently, are failed to insert string records in the SQL table.
This is because SQL Server allocates 1 character space as the default value to the varchar column
that is defined without any length.
# In practical scenarios, varchar(n) is used to store variable length value as a string, here ‘n’ denotes the string length
in bytes and it can go up to 8000 characters (MsSql).

# Example
CREATE TABLE Demovarchar
(
Id int NOT NULL IDENTITY(1,1),
LastName varchar(10),
FirstName varchar(10),
Gender varchar,
DepartmentName varchar(20),
Age int
)
INSERT INTO Demovarchar VALUES('Gilbert', 'Kevin','M','Tool Design',33) => OK

INSERT INTO Demovarchar VALUES('Newton Hamilton', 'Isaac','M','Design Head',69) => ERROR

=> VARCHAR( N ) vs. VARCHAR( MAX ):
=> Difference between the varchar(max) and varchar(n) data type [in MsSql ]
varchar(max) varchar(n)
# store up to 2 GB of data # store up to 8000 bytes data
# <= 8000 bytes it uses allocation unit IN_ROW_Data. # stores data in the standard data page
> 8000 bytes it uses LOB_Data page and
stores its pointer in the IN_ROW_Data page
# cannot create an index on the key column # can create an index on this data type
of the varchar(max) data type
# cannot compress the LOB data # can compress data for this data type
# Data retrieval and updation on the LOB data is slow

We do not face such issue in the varchar(n) data type
=> MsSql 2005+ got around this limitation of 8KB storage size and provided a workaround with varchar(max).
It is a non-Unicode large variable-length character data type and can store a maximum of 2^31-1 bytes (2 GB)
of non-Unicode characters.
CREATE TABLE Demovarcharmax
(
ID INT IDENTITY(1, 1) ,
StringCol VARCHAR(MAX)
)
INSERT INTO Demovarcharmax(StringCol) VALUES(REPLICATE(CAST('B' AS VARCHAR(MAX)), 15000))
SELECT Id, StringCol,len(StringCol) AS LengthOfString FROM Demovarcharmax

=> One limitation of using varchar(max) is we cannot create an index that has a varchar(max) as a key column,
instead, it is advisable to do a Full-text index on that column.
=> Do NOT consider it as the varchar with default value = 1.
=> MsSql 2019+, announced support for UTF-8 character encoding to the existing data types (varchar and char).
Reduction in storage and performance improvements
UTF-8 support for varchar data type provides substantial storage savings depending on the character set in use.
For eg, using an UTF-8 enabled collation, changing the column data type from nvarchar(20) to varchar(20) offers a
significant drop in storage requirements since nvarchar(20) requires 40 bytes for storage and varchar(20) needs 20.
=> Impact on string length of SQL varchar with CAST and CONVERT functions
SQL Server stores long string data in the commonly used varchar data type and it becomes helpful to know the
expected and maximum lengths of the strings to display the results in the UI.
=> MsSql takes 30 as the default length for SQL Varchar (with unspecified varchar length) in the SQL Server
when it is used with CAST and CONVERT functions.
In our case, even though the length of the string was 52, it returned 30 as the length as shown in the last result output.
=> One important point to note here is that when an unspecified length varchar field is created,
the default length of such field is 1 (shown in red color below).
When varchar length is unspecified and is used with CAST or CONVERT functions, the CAST or CONVERT returns
n=30 as the default string length of this conversion (marked in blue color below).
=> Insert 10,000 records, to check the data insertion time. Y
Employee_varchar_2000 insertion time 0.08 Seconds
Employee_varchar_4500 insertion time 0.19 Seconds
Employee_varchar_8000 insertion time 0.31 Seconds
Employee_varchar_Max insertion time 2.72 Seconds

Sqlite:
=> What is the maximum size of a VARCHAR ?
# SQLite does not enforce the length of a VARCHAR.
# You can declare a VARCHAR(10) and SQLite will be happy to store a 500-million character string there.
And it will keep all 500-million characters intact.
# Your content is never truncated.
# SQLite understands the column type of "VARCHAR(N)" to be the same as "TEXT", regardless of the value of N.
=> TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).
# If the declared type of the column contains any of the strings "CHAR", "CLOB", or "TEXT" then that column
has TEXT affinity.
# Notice that the type VARCHAR contains the string "CHAR" and is thus assigned TEXT affinity.
=> Affinity Name Examples
# The following table shows how many common datatype names from more traditional SQL implementations are
converted into affinities by the five rules of the previous section.
# This table shows only a small subset of the datatype names that SQLite will accept.
# Note that numeric arguments in parentheses that following the type name (ex: "VARCHAR(255)") are ignored by
SQLite - SQLite does not impose any length restrictions
(other than the large global SQLITE_MAX_LENGTH limit) on the length of strings, BLOBs or numeric values.

=> Example Typenames
CREATE TABLE Statement or CAST Expression Resulting Affinity Rule Used To Determine Affinity
CHARACTER(20) TEXT 2
VARCHAR(255)
VARYING CHARACTER(255)
NCHAR(55)
NATIVE CHARACTER(70)
NVARCHAR(100)
TEXT
CLOB

VARCHAR, VARCHAR(MAX), TEXT: Gibt es guten OnDisk-Space Vergleich verschiedener DB ?

VARCHAR, VARCHAR(MAX), TEXT: Gibt es guten OnDisk-Space Vergleich verschiedener DB ?

Forumregeln