![]() |
HTML Tags entfernen
Hi!
Ich melde mich nun schon wieder mit einem euene Problem. Mir geht es darum, alle HTML-Tags zu entfernen, damit ich nur noch z.B.: Linkbeschriftungen oder den auf der Seite sichtbaren Text habe und diesen bearbeiten kann. Wer kann mir sagen, wie das funktionieren könnte? Wohlgemerkt es soll alles mit delphi passieren^^. Ich freue mich schon auf eure Hilfe und hoffe das ich mir auch selbst eine Antwort geben kann! |
Re: HTML Tags entfernen
Ein Anfang wäre, alle < zu ermitteln und von deren Position bis zum folgenden > alles zu löschen.
|
Re: HTML Tags entfernen
Hatte gerade das gleiche Problem und habe nu provisorisch erst mal folgendes gebastelt und zusammengesucht
Delphi-Quellcode:
einfach mit html2txt() aufrufen
function GiveSZ(HCode: string): Char;
var i : Integer; begin Result := ' '; if (HCode = '"') or (HCode = '"') then Result := '"'; if (HCode = '&') or (HCode = '&') then Result := '&'; if (HCode = '<') or (HCode = '<') then Result := '<'; if (HCode = '>') or (HCode = '>') then Result := '>'; // ISO 160 bis ISO 255 Codes if (HCode = '') or (HCode = ' ') then Result := ' '; if (HCode = '&iexl;') or (HCode = '¡') then Result := '¡'; if (HCode = '¢') or (HCode = '¢') then Result := '¢'; if (HCode = '£') or (HCode = '£') then Result := '£'; if (HCode = '¤') or (HCode = '¤') then Result := '¤'; if (HCode = '¥') or (HCode = '¥') then Result := '¥'; if (HCode = '&brkbar;') or (HCode = '¦') then Result := '¦'; if (HCode = '§') or (HCode = '§') then Result := '§'; if (HCode = '¨') or (HCode = '¨') then Result := '¨'; if (HCode = '©') or (HCode = '©') then Result := '©'; if (HCode = 'ª') or (HCode = 'ª') then Result := 'ª'; if (HCode = '«') or (HCode = '«') then Result := '«'; if (HCode = '¬') or (HCode = '¬') then Result := '¬'; if (HCode = '­') or (HCode = '*') then Result := '*'; if (HCode = '®') or (HCode = '®') then Result := '®'; if (HCode = '&hibar;') or (HCode = '¯') then Result := '¯'; if (HCode = '°') or (HCode = '°') then Result := '°'; if (HCode = '±') or (HCode = '±') then Result := '±'; if (HCode = '²') or (HCode = '²') then Result := '²'; if (HCode = '³') or (HCode = '³') then Result := '³'; if (HCode = '´') or (HCode = '´') then Result := '´'; if (HCode = 'µ') or (HCode = 'µ') then Result := 'µ'; if (HCode = '¶') or (HCode = '¶') then Result := '¶'; if (HCode = '·') or (HCode = '·') then Result := '·'; if (HCode = '¸') or (HCode = '¸') then Result := '¸'; if (HCode = '¹') or (HCode = '¹') then Result := '¹'; if (HCode = 'º') or (HCode = 'º') then Result := 'º'; if (HCode = '»') or (HCode = '»') then Result := '»'; if (HCode = '¼') or (HCode = '¼') then Result := '¼'; if (HCode = '½') or (HCode = '½') then Result := '½'; if (HCode = '¾') or (HCode = '¾') then Result := '¾'; if (HCode = '¿') or (HCode = '¿') then Result := '¿'; if (HCode = 'À') or (HCode = 'À') then Result := 'À'; if (HCode = 'Á') or (HCode = 'Á') then Result := 'Á'; if (HCode = 'Â') or (HCode = 'Â') then Result := 'Â'; if (HCode = 'Ã') or (HCode = 'Ã') then Result := 'Ã'; if (HCode = 'Ä') or (HCode = 'Ä') then Result := 'Ä'; if (HCode = 'Å') or (HCode = 'Å') then Result := 'Å'; if (HCode = '&AEling;') or (HCode = 'Æ') then Result := 'Æ'; if (HCode = 'Ç') or (HCode = 'Ç') then Result := 'Ç'; if (HCode = 'È') or (HCode = 'È') then Result := 'È'; if (HCode = 'É') or (HCode = 'É') then Result := 'É'; if (HCode = 'Êe;') or (HCode = 'Ê') then Result := 'Ê'; if (HCode = 'Ë') or (HCode = 'Ë') then Result := 'Ë'; if (HCode = 'Ì') or (HCode = 'Ì') then Result := 'Ì'; if (HCode = 'Í') or (HCode = 'Í') then Result := 'Í'; if (HCode = 'Îe;') or (HCode = 'Î') then Result := 'Î'; if (HCode = 'Ï') or (HCode = 'Ï') then Result := 'Ï'; if (HCode = 'Ð') or (HCode = 'Ð') then Result := 'Ð'; if (HCode = 'Ñ') or (HCode = 'Ñ') then Result := 'Ñ'; if (HCode = 'Ò') or (HCode = 'Ò') then Result := 'Ò'; if (HCode = 'Ó') or (HCode = 'Ó') then Result := 'Ó'; if (HCode = 'Ô') or (HCode = 'Ô') then Result := 'Ô'; if (HCode = 'Õ') or (HCode = 'Õ') then Result := 'Õ'; if (HCode = 'Ö') or (HCode = 'Ö') then Result := 'Ö'; if (HCode = '×') or (HCode = '×') then Result := '×'; if (HCode = 'Ø') or (HCode = 'Ø') then Result := 'Ø'; if (HCode = 'Ù') or (HCode = 'Ù') then Result := 'Ù'; if (HCode = 'Ú') or (HCode = 'Ú') then Result := 'Ú'; if (HCode = 'Û') or (HCode = 'Û') then Result := 'Û'; if (HCode = 'Ü') or (HCode = 'Ü') then Result := 'Ü'; if (HCode = 'Ý') or (HCode = 'Ý') then Result := 'Ý'; if (HCode = 'Þ') or (HCode = 'Þ') then Result := 'Þ'; if (HCode = 'ß') or (HCode = 'ß') then Result := 'ß'; if (HCode = 'à') or (HCode = 'à') then Result := 'à'; if (HCode = 'á') or (HCode = 'á') then Result := 'á'; if (HCode = 'â') or (HCode = 'â') then Result := 'â'; if (HCode = 'ã') or (HCode = 'ã') then Result := 'ã'; if (HCode = 'ä') or (HCode = 'ä') then Result := 'ä'; if (HCode = 'å') or (HCode = 'å') then Result := 'å'; if (HCode = '&aeling;') or (HCode = 'æ') then Result := 'æ'; if (HCode = 'ç') or (HCode = 'ç') then Result := 'ç'; if (HCode = 'è') or (HCode = 'è') then Result := 'è'; if (HCode = 'é') or (HCode = 'é') then Result := 'é'; if (HCode = 'ê') or (HCode = 'ê') then Result := 'ê'; if (HCode = 'ë') or (HCode = 'ë') then Result := 'ë'; if (HCode = 'ì') or (HCode = 'ì') then Result := 'ì'; if (HCode = 'í') or (HCode = 'í') then Result := 'í'; if (HCode = 'î') or (HCode = 'î') then Result := 'î'; if (HCode = 'ï') or (HCode = 'ï') then Result := 'ï'; if (HCode = 'ð') or (HCode = 'ð') then Result := 'ð'; if (HCode = 'ñ') or (HCode = 'ñ') then Result := 'ñ'; if (HCode = 'ò') or (HCode = 'ò') then Result := 'ò'; if (HCode = 'ó') or (HCode = 'ó') then Result := 'ó'; if (HCode = 'ô') or (HCode = 'ô') then Result := 'ô'; if (HCode = 'õ') or (HCode = 'õ') then Result := 'õ'; if (HCode = 'ö') or (HCode = 'ö') then Result := 'ö'; if (HCode = '÷') or (HCode = '÷') then Result := '÷'; if (HCode = 'ø') or (HCode = 'ø') then Result := 'ø'; if (HCode = 'ù') or (HCode = 'ù') then Result := 'ù'; if (HCode = '&uacude;') or (HCode = 'ú') then Result := 'ú'; if (HCode = 'û') or (HCode = 'û') then Result := 'û'; if (HCode = 'ü') or (HCode = 'ü') then Result := 'ü'; if (HCode = 'ý') or (HCode = 'ý') then Result := 'ý'; if (HCode = 'þ') or (HCode = 'þ') then Result := 'þ'; if (HCode = 'ÿ') or (HCode = 'ÿ') then Result := 'ÿ'; if Result = ' ' then begin delete(HCode, 1, 2); delete(HCode, length(HCode), 1); if TryStrToInt(HCode, i) then Result := Char(i); end; end; function ReplaceHTMLChar(sValue: string): string; var tagStartPos : Integer; tagEndPos : Integer; tag, newTag : string; temp : string; begin tagStartPos := Pos('&', sValue); tagEndPos := PosEx(';', sValue, tagStartPos); if tagEndPos - tagStartPos < 8 then begin tag := copy(sValue, tagStartPos, tagEndPos - tagStartPos + 1); newTag := GiveSZ(tag); temp := copy(sValue, 1, tagStartPos - 1) + newTag + copy(sValue, tagEndPos + 1, length(sValue) - tagEndPos); sValue := temp; tagEndPos := tagEndPos - length(tag) + length(newTag); while (PosEx('&', sValue, tagEndPos) <> 0) and (PosEx(';', sValue, tagEndPos) <> 0) do begin tagStartPos := PosEx('&', sValue, tagEndPos); tagEndPos := PosEx(';', sValue, tagStartPos); if tagEndPos - tagStartPos < 8 then begin tag := copy(sValue, tagStartPos, tagEndPos - tagStartPos + 1); newTag := GiveSZ(tag); temp := copy(sValue, 1, tagStartPos - 1) + newTag + copy(sValue, tagEndPos + 1, length(sValue) - tagEndPos); sValue := temp; tagEndPos := tagEndPos - length(tag) + length(newTag); end; end; end; Result := sValue; end; function Html2Txt(html: string): string; var istag : boolean; i : Integer; ch : Char; temp : string; slRes : TStrings; begin result := ''; temp := ''; istag := false; html := ReplaceHTMLChar(html); for i := 1 to length(html) do begin ch := html[i]; if (ch = '<') and (istag = false) then begin istag := true; continue; end; if (ch = '>') and (istag = true) then begin istag := false; continue; end; if istag = false then temp := temp + ch; end; slRes := TStringList.Create; try slRes.Text := temp; for i := 0 to slRes.Count - 1 do slRes[i] := Trim(slRes[i]); while slRes.IndexOf('') <> -1 do slRes.delete(slRes.IndexOf('')); finally Result := slRes.Text; slRes.Free; end; end; Damit werden alle html-tags und scripte entfernt, sowie die html-sonderzeichen ersetzt. Ich arbeite immo noch an einer Lösung mit regulären Ausdrücken. Gruß tr909 |
Re: HTML Tags entfernen
erstmal danke tr909!
könntest du mir noch schreiben, wie du funktionen aufrufst? also in welcher reihenfolge und was du übergibst? |
Re: HTML Tags entfernen
Zitat:
|
Re: HTML Tags entfernen
Delphi-Quellcode:
WebBrowser1.OleObject.Document.documentElement.innerText;
|
Re: HTML Tags entfernen
hups das hab ich glat übersehen!
:wall: :oops: *peinlich* danke! |
Re: HTML Tags entfernen
Allgemeine Lösung:
Delphi-Quellcode:
function StripTags( line: string): string;
var p, p1, p2, pr: PChar; begin p:= PChar(line); while( p <> nil) and ( p <> '') do begin p1 := StrScan( p, '<'); if p1 <> nil then begin p2 := StrScan( p1, '>'); if p2 <> Nil then begin StrLCopy( pr, p, p1-p); Result := Result + pr; p := p2+1; end else begin Result := Result + p; p:= nil; end end else begin Result := Result + p; p:= nil; end; end; end; |
Re: HTML Tags entfernen
Zitat:
|
Re: HTML Tags entfernen
Optimierte Version, die einzelne < oder > erkennt:
Delphi-Quellcode:
Problem nur noch wenn beides in "richtiger" Reihenfolge auftritt.
function StripTags( line: string): string;
var p, p1, p2, pr, pt: PChar; begin p:= PChar(line); while( p <> nil) and ( p <> '') do begin p1 := StrScan( p, '<'); if p1 <> nil then begin p2 := StrScan( p1, '>'); pt := StrScan( p1+1, '<'); if pt <> Nil then if pt < p2 then //weiteres < vor > begin StrLCopy( pr, p, p1-p); Result := Result + pr; p := p1; p1 := pt; end; if p2 <> Nil then begin StrLCopy( pr, p, p1-p); Result := Result + pr; p := p2+1; end else begin Result := Result + p; p:= nil; end end else begin Result := Result + p; p:= nil; end; end; end; |
Alle Zeitangaben in WEZ +1. Es ist jetzt 21:29 Uhr. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO © 2011, Crawlability, Inc.
Delphi-PRAXiS (c) 2002 - 2023 by Daniel R. Wolf, 2024 by Thomas Breitkreuz