Delphi-PRAXiS - Einzelnen Beitrag anzeigen

**himitsu**

Sooo, jetzt wo ich wieder mehr Zeit hab und vorallem endlich mal weiß wo dieser fieße Fehler sich versteckte, welchen ich einfach nicht fand,
>

fehlerhafte Referenzzählung? (Record + dyn. Array)
kann es nun weitergehn.

Schön, wenn man 'ne gute Hand voller Anwendung im Code umbauen/patchen muß, weil Delphi einfach nicht richtig funktioniert.
(Das Projekt hier ist aktuell wichtiger, aber mir graut schon davor, wenn ich im himXML nachsehn muß, ob sich da nicht auch solche "Fallen" verstecken, da dort ebenfalls einige dieser Records verbaut sind

)

Zitat von himitsu:

Es ist nicht unbedingt leicht dieses umzusetzen.

Hab jetzt erstmal die hierfür schon vorhandenen Codes auskommentiert und werde es aktuell auch nicht weiterverfolgen (falls sich niemand findet, welcher sowas benötigt).

Und nochmal zum Unicode:
Diese Klasse wird komplett nur auf Unicode ausgelegt sein, besitzt aber für den Notfall einen Konverter.

markieren

Delphi-Quellcode:

			Class Function Convert(Const Expr: RawByteString; SourceEncoding: TEncoding = nil): UnicodeString;

Class Function Convert(Const Expr: UnicodeString; DestEncoding:   TEncoding = nil): RawByteString;

Wenn es dann mal läuft, wird noch eine separate SingleByte-Version davon erstellt und für MultiByte-Zeichensätze, wie z.B. UTF-8, eine Umleitung zum Unicode eingerichtet.
Und es wird alles nur für Delphi 2009 oder höher geben.
(eine alternative Anpassung, bis auf D2006/TDE runter, ist noch offen und noch weiter runter wir nicht möglich sein)

Der aktuelle Inhalt meiner RegExp-Definition (Zeilenenden etwas abgeschnitten ... Rest siehe RegEx.txt da oben)
Wie gesagt, falls jemand Fehler oder Verbesserungen entdeckt ... bitte frühzeitig melden.

markieren

Code:

			description

   &#12302;patt&#12303;              pattern

   &#12300;patt&#12301;              alternative

   &#12308;patt 1&#9553;patt 2&#9553;…&#12309;   alternative group

   &#12304;name&#12305;              see description "name"

   &#12296;…&#12297;                 -

expression

   &#12302;&#12304;delimiter&#12305;&#12304;pattern&#12305;&#12304;delimiter&#12305;&#12304;modifiers&#12305;&#12303;

delimiter

   A delimiter can be any non-alphanumeric, non-whitespace character, but ...

   Often used delimiters are forward slashes (/), hash signs (#) and tilde...

   The delimiters as in order of their statistical use: /#~!@%°=&

modifiers

   &#12302;&#12300;&#12304;set&#12305;&#12301;&#12300;-&#12304;reset&#12305;&#12301;&#12303;

   Values for &#12304;set&#12305; and &#12304;reset&#12305; are group of the following characters:

   i          remCaseLess        Do case-insensitive pattern matching.

   m          remMultiLine       Treat string as multiple lines. That is, ...

   A    (2)   remAnchored        *

   D    (2)   remDollarEndOnly   *(ignored if modifier "m" is set)

   s          remSingleLine      Treat string as single line. That is, cha...

   S    (1)                      *Ausführung steigern

   U          remUngreedy        *Gier unterdrücken

   x          remExtended        Extend your pattern's legibility by permi...

   u    (1)                      *UTF-8 interpretiert

   p    (1)   (preserve)         Preserve the string matched such that ${^...

   g    (1)   (global)           Global matching

   1)   not supported

   2)   not allowed as pattern in extendet groups

pattern syntax - meta-characters:

   &#12302;\…&#12303;     general escape character with several uses

   &#12302;(…)&#12303;    subpattern

   &#12302;…|…&#12303;    alternative patterns

   &#12302;.&#12303;      match any character except newline (by default)

   &#12302;^&#12303;      assert start of subject (or line, in multiline mode)

   &#12302;$&#12303;      assert end of subject (or line, in multiline mode)

   &#12302;[…]&#12303;    character class

   &#12302;…?&#12303;     0 or 1 quantifier (or quantifier minimizer)

   &#12302;…*&#12303;     0 or more quantifier

   &#12302;…+&#12303;     1 or more quantifier

   &#12302;…{…}&#12303;   min/max quantifier

   &#12302;#…&#12303;     comment - only if modifier "x" is set

   If used this characters, this must be delimited.

meta-characters in character classes:

   &#12302;\…&#12303;     general escape character

   &#12302;^&#12303;      negate the class, but only if the first character

   &#12302;-&#12303;      indicates character range

   &#12302;[:…:]&#12303;  POSIX character class

delimited characters and classes

   \0         null or Octal character code

   \1 to \9   back reference

   \a         bell (alert)

   \A         text start

   \b \B      word boundary

   \c         control character

   \C         single character

   \d \D      decimal digit

   \e         escape

   \E         end of quote (\Q, \L and \U)

   \f         form feed

   \g         back reference

   \G         matches start

   \h \H      horizontal space characters

   \k         named back reference

   \K         keep the left stuff

   \l \L      lowercase characters

   \n         new line

   \N         named unicode character

   \p \P      named property

   \Q         quote

   \r         carrige return

   \R         newline sequence

   \s \S      space

   \t         tabulator

   \u \U      uppercase characters

   \v \V      vertical space characters

   \w \W      word characters

   \x         heXadecimal character code

   \X         eXtended unicode sequence

   \z         text end

   \Z         text end or end of last line

   \<         start of word

   \>         end of word

   The followed characters must be delimited if they are to be used.

      \ ( ) | . ^ $ [ ? * + {

      #   (if modifier "x" is set)

characters

   &#12302;\0&#12304;digit&#12305;&#12303;                octal character code

   &#12302;\x&#12304;x-digit&#12305;&#12304;x-digit&#12305;&#12303;   heXadecimal character code (Ansi)

   &#12302;\x{&#12304;x-digits&#12305;}&#12303;           heXadecimal character code (Unicode)

   &#12302;\c&#12304;character&#12305;&#12303;            control char

   &#12302;\N{&#12304;name&#12305;}&#12303;               named unicode character

   supported names

      U+xxxx                      hexadecimal character code

named character class (named unicode properties)

   &#12302;\p&#12304;character&#12305;&#12303;

   &#12302;\p{&#12304;name&#12305;}&#12303;      for names of only one letter

   &#12302;\P&#12304;character&#12305;&#12303;   any characters but not this

   &#12302;\P{&#12304;name&#12305;}&#12303;      any characters but not this

   supported classes

      IsCntrl, IsSpace, IsSpacePerl, IsDigit, IsXDigit, IsUpper, IsLower,

      IsAlpha, IsAlnum, IsWord, IsPunct, IsGraph, IsPrint, IsASCII

   supported scripts

      Arabic, Armenian, Balinese, Bengali, Bopomofo, Braille, Buginese, Bu...

      Canadian_Aboriginal, Cherokee, Common, Coptic, Cuneiform, Cypriot, C...

      Deseret, Devanagari, Ethiopic, Georgian, Glagolitic, Gothic, Greek, ...

      Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Inherited, Kannada...

      Kharoshthi, Khmer, Lao, Latin, Limbu, Linear_B, Malayalam, Mongolian...

      New_Tai_Lue, Nko, Ogham, Old_Italic, Old_Persian, Oriya, Osmanya, Ph...

      Phoenician, Runic, Shavian, Sinhala, Syloti_Nagri, Syriac, Tagalog, ...

      Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi

   supported general category property codes

      C     other

      Cc    control

      Cf    format

      Cn    unassigned

      Co    private use

      Cs    surrogate

      L     letter

      Ll    lower case letter - specifying caseless matching does not affe...

      Lm    modifier letter

      Lo    other letter

      Lt    title case letter - specifying caseless matching does not affe...

      Lu    upper case letter - specifying caseless matching does not affe...

      M     mark

      Mc    spacing mark

      Me    enclosing mark

      Mn    non-spacing mark

      N     number

      Nd    decimal number

      Nl    letter number

      No    other number

      P     punctuation

      Pc    connector punctuation

      Pd    dash punctuation

      Pe    close punctuation

      Pf    final punctuation

      Pi    initial punctuation

      Po    other punctuation

      Ps    open punctuation

      S     symbol

      Sc    currency symbol

      Sk    modifier symbol

      Sm    mathematical symbol

      So    other symbol

      Z     separator

      Zl    line separator

      Zp    paragraph separator

      Zs    space separator

character class

   &#12302;[&#12300;^&#12301;&#12304;character list&#12305;&#12300;&#12304;character list&#12305;…&#12301;]&#12303;

   character list

      &#12302;&#12304;character&#12305;&#12303;                 single character or delimited char...

      &#12302;&#12304;character&#12305;-&#12304;character&#12305;&#12303;   range of characters

      &#12302;\&#12304;class&#12305;&#12303;                    delimited class

      &#12302;[:&#12304;POSIX&#12305;:]&#12303;                 POSIX character class

   ^   inverts the class

POSIX character class

   &#12302;[&#12300;^&#12301;:&#12304;name&#12305;:]&#12303;:

   this can used only in a character class ( […] )

   supported classes

      cntrl, space, blank, digit, xdigit, upper, lower,

      alpha, alnum, punct, graph, print

group

   &#12302;(&#12304;pattern&#12305;)&#12303;

named group

    &#12302;(?&#12300;P&#12301;<&#12304;name&#12305;>&#12304;pattern&#12305;)&#12303;

modifier change (extendet group)

    &#12302;(?&#12304;modifiers&#12305;)&#12303;

extendet group

    &#12302;(?&#12300;&#12304;modifiers&#12305;&#12301;:&#12304;pattern&#12305;)&#12303;

look-ahead

    &#12302;(?&#12300;&#12304;modifiers&#12305;&#12301;=&#12304;pattern&#12305;)&#12303;

negative look-ahead

    &#12302;(?&#12300;&#12304;modifiers&#12305;&#12301;!&#12304;pattern&#12305;)&#12303;

look-behind

    &#12302;(?&#12300;&#12304;modifiers&#12305;&#12301;<=&#12304;pattern&#12305;)&#12303;

negative look-behind

    &#12302;(?&#12300;&#12304;modifiers&#12305;&#12301;<!&#12304;pattern&#12305;)&#12303;

recursive subpattern

    &#12302;(?&#12300;-&#9553;+&#12301;&#12304;number&#12305;)&#12303;

    &#12302;(?R)&#12303;

    &#12302;(?P>&#12304;name&#12305;)&#12303;

    &#12302;(?P&&#12304;name&#12305;)&#12303;

   clones the pattern (not the result) of a previous group

   (?R) = (?0)

conditional subpattern

   &#12302;(?(&#12304;condition&#12305;)&#12304;yes-pattern&#12305;&#12300;|&#12304;no-pattern&#12305;&#12301;)&#12303;

  condition

      &#12302;&#12300;-&#9553;+&#12301;&#12304;number&#12305;&#12303;

      &#12302;R&#12303;

      &#12302;{&#12304;name&#12305;}&#12303;

      &#12302;&#12304;pattern&#12305;&#12303;

back references

   &#12302;\&#12304;digit&#12305;&#12303;              for the references 1 to 9

   &#12302;\g&#12304;digit&#12305;&#12303;

   &#12302;\g{&#12300;-&#9553;+&#12301;&#12304;number&#12305;}&#12303;

   &#12302;\g&#12304;character&#12305;&#12303;         for names of only one letter

   &#12302;\g{&#12304;name&#12305;}&#12303;

named back references

   &#12302;\k<&#12304;name&#12305;>&#12303;

   &#12302;\k'&#12304;name&#12305;'&#12303;

   &#12302;\k{&#12304;name&#12305;}&#12303;

comments

    &#12302;(?#&#12304;text&#12305;)&#12303;

    &#12302;#&#12304;text&#12305;([\r\n]|$)&#12303;   (1)

   non in character sets

   1)   only if modifier "e" is set

quantifier

    &#12302;&#12304;pattern&#12305;?&#12300;?&#9553;+&#12301;&#12303;       einmal oder garnicht     equivalent to  ...

    &#12302;&#12304;pattern&#12305;*&#12300;?&#9553;+&#12301;&#12303;       garnicht oder mehrmals   equivalent to  ...

    &#12302;&#12304;pattern&#12305;+&#12300;?&#9553;+&#12301;&#12303;       mindestens einmal        equivalent to  ...

    &#12302;&#12304;pattern&#12305;{n}&#12300;?&#9553;+&#12301;&#12303;     n-mal

    &#12302;&#12304;pattern&#12305;{n,}&#12300;?&#9553;+&#12301;&#12303;    mindestens n-mal

    &#12302;&#12304;pattern&#12305;{n,m}&#12300;?&#9553;+&#12301;&#12303;   n-mal bis m-mal

characters and character classes:

   .    any character - if multiple lines are not activated then doesn't m...

   \0   null character

   \a   bell (alert #7)

   \n   new line (#10)

   \f   form feed (#13)

   \e   escape {#27}

   \t   tabulator (#9)

   \h   horizontal space characters

   \v   vertical space characters

   \r   carrige return (#13)

   \R   newline sequence

   \d   decimal digit

   \w   word character

   \s   space

   \X   eXtended unicode sequence

   \C   single char - one character or a part of surrogate pairs

   \H   any character but none horizontal space characters

   \V   any character but an vertical space characters

   \D   any character but not a decimal digit

   \W   any character but an word character

   \S   any character but a space

control classes:

   ^    line start

   $    line end

   \A   text start

   \G   matches start

   \z   text end

   \Z   text end or end of last line

   \b   word boundary

   \B   not a word boundary

   \<   start of word

   \>   end of word

   \l   lowercase next char

   \u   uppercase next char

   \L   lowercase till \E

   \U   uppercase till \E

   \Q   quote (disable) pattern metacharacters till \E

   \E   end of quote (\Q, \L and \U)

   \K   keep the stuff left of the \K, don't include it in result

options

   reoSplitNoEmpty                   If this flag is set, then from SPLIT ...

   reoSplitDelimCapture              If this flag is set, then be parenthe...

   reoOffsetCapture                  If this flag is set, then returned wi...

   reoSplitSetCapture                Orders results so that $array[0] an a...

   default (no reoSplitSetCapture)   Orders results so that $array[0] an a...

   reoCustomizeLinebreaks

related character classes and sets

   DESCRIPTION       POSIX         PERL FN           PERL PERL            ...

                                                                          ...

   ---------------   -----------   ---------------   --   ----------------...

   any char                                          .    [^\n\r]

   control           [:cntrl:]     \p{IsCntrl}            [\x00-\x1F\x7F] ...

   white space+tab   [:blank:]     \p{IsSpace}            [ \t]           ...

   whitespace                      \p{IsSpace}       \s   [ \f\t\v]

   whitespacePerl    [:space:]     \p{IsSpacePerl}        [ \f\n\r\t\v]   ...

   punctuation       [:punct:]     \p{IsPunct}            [!-/:-@[-`{-~]  ...

   decimal digit     [:digit:]     \p{IsDigit}       \d   [0-9]           ...

   hexadecimal       [:xdigit:]    \p{IsXDigit}           [0-9A-Fa-f]     ...

   upper             [:upper:]     \p{IsUpper}       \u   [A-Z]           ...

   lower             [:lower:]     \p{IsLower}       \l   [a-z]           ...

   upper+lower       [:alpha:]     \p{IsAlpha}            [A-Za-z]        ...

   alphanumeric      [:alnum:]     \p{IsAlnum}            [A-Za-z0-9]     ...

   alphanumeric+_    [:word:]      \p{IsWord}        \w   [A-Za-z0-9_]    ...

   printable         [:graph:]     \p{IsGraph}            [!-~]           ...

   printable+space   [:print:]     \p{IsPrint}            [ -~]           ...

   any ASCII         [:ascii:]     \p{IsASCII}            [\x00-\xFF]     ...

   any Unicode                                            [\x00-\x{FFFF}]

   [:punct:]    []!"#$%&\'()*+,./:;<=>?@\\^_`{|}~[-]

   [:xdigit:]   [[:digit:]A-Fa-f]

   [:alpha:]    [[:upper:][:lower:]]

   [:alnum:]    [[:alpha:][:digit:]]

   [:word:]     [[:alnum:]_]

   [:graph:]    [[:word:][:punct:]]

   [:print:]    [ [:graph:]]

Einzelnen Beitrag anzeigen

Re: ePCRE (himi's TRegEx)