Cherche unité pour comparer 2 strings

**Bruno13** · 16/05/2008, 14h52

Bonjour à tous,

Comme l'indique mon titre je cherche une fonction assez performante en qualité de résultat qui permet de trouver la similarité en % entre 2 chaînes de caractère.

J'ai biensûr trouvé la fameuse fonction ci-dessous mais je n'en suis pas tout fait satisfait en terme de résultat.

Quelqu'un aurait-il une autre fonction ?

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
 
... Compare two strings in percent (strings simularity)?
Author: Eugene Nosko
Homepage: http://www.dxstar.com
 
Compares two strings in percent (how they are similar to each other)
Returns byte value from 0 to 100%
 
examples:
 
var
  Percent: byte;
 
begin
  Percent := CompareStringsInPercent('this is a test', 'This is another test'); // 37%
  Percent := CompareStringsInPercent('this is some string', 'and yet another some string'); // 24%
  Percent := CompareStringsInPercent('abcde', 'fghij'); // 0%
  Percent := CompareStringsInPercent('1.jpg', '2.jpg'); // 81%
 
...
 
}
 
function CompareStringsInPercent(Str1, Str2: string): Byte;
type
  TLink = array[0..1] of Byte;
var
  tmpPattern: TLink;
  PatternA, PatternB: array of TLink;
  IndexA, IndexB, LengthStr: Integer;
begin
  Result := 100;
  // Building pattern tables
  LengthStr := Max(Length(Str1), Length(Str2));
  for IndexA := 1 to LengthStr do 
  begin
    if Length(Str1) >= IndexA then 
    begin
      SetLength(PatternA, (Length(PatternA) + 1));
      PatternA[Length(PatternA) - 1][0] := Byte(Str1[IndexA]);
      PatternA[Length(PatternA) - 1][1] := IndexA;
    end;
    if Length(Str2) >= IndexA then 
    begin
      SetLength(PatternB, (Length(PatternB) + 1));
      PatternB[Length(PatternB) - 1][0] := Byte(Str2[IndexA]);
      PatternB[Length(PatternB) - 1][1] := IndexA;
    end;
  end;
  // Quick Sort of pattern tables
  IndexA := 0;
  IndexB := 0;
  while ((IndexA < (Length(PatternA) - 1)) and (IndexB < (Length(PatternB) - 1))) do 
  begin
    if Length(PatternA) > IndexA then 
    begin
      if PatternA[IndexA][0] < PatternA[IndexA + 1][0] then 
      begin
        tmpPattern[0]           := PatternA[IndexA][0];
        tmpPattern[1]           := PatternA[IndexA][1];
        PatternA[IndexA][0]     := PatternA[IndexA + 1][0];
        PatternA[IndexA][1]     := PatternA[IndexA + 1][1];
        PatternA[IndexA + 1][0] := tmpPattern[0];
        PatternA[IndexA + 1][1] := tmpPattern[1];
        if IndexA > 0 then Dec(IndexA);
      end
      else 
        Inc(IndexA);
    end;
    if Length(PatternB) > IndexB then 
    begin
      if PatternB[IndexB][0] < PatternB[IndexB + 1][0] then 
      begin
        tmpPattern[0]           := PatternB[IndexB][0];
        tmpPattern[1]           := PatternB[IndexB][1];
        PatternB[IndexB][0]     := PatternB[IndexB + 1][0];
        PatternB[IndexB][1]     := PatternB[IndexB + 1][1];
        PatternB[IndexB + 1][0] := tmpPattern[0];
        PatternB[IndexB + 1][1] := tmpPattern[1];
        if IndexB > 0 then Dec(IndexB);
      end
      else 
        Inc(IndexB);
    end;
  end;
  // Calculating simularity percentage
  LengthStr := Min(Length(PatternA), Length(PatternB));
  for IndexA := 0 to (LengthStr - 1) do 
  begin
    if PatternA[IndexA][0] = PatternB[IndexA][0] then 
    begin
      if Max(PatternA[IndexA][1], PatternB[IndexA][1]) - Min(PatternA[IndexA][1],
        PatternB[IndexA][1]) > 0 then Dec(Result,
        ((100 div LengthStr) div (Max(PatternA[IndexA][1], PatternB[IndexA][1]) -
          Min(PatternA[IndexA][1], PatternB[IndexA][1]))))
      else if Result < 100 then Inc(Result);
    end
    else 
      Dec(Result, (100 div LengthStr))
  end;
  SetLength(PatternA, 0);
  SetLength(PatternB, 0);
end;

**Bruno13** · 16/05/2008, 15h07

Re-Bonjour,

Je viens de trouver cette nouvelle fonction, cependant il faudrait la modifier (peut-être) pour obtenir un pourcentage comme réponse. J'avoue que je ne l'ai pas encore testé.

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
 
{
What is Levenshtein Distance/Metric?
Levenshtein distance (LD) is a measure of the similarity between two strings, which we will refer to as the source string (s) and the target string (t). The distance is the number of deletions, insertions, or substitutions required to transform s into t.
If  x,y are strings then the distances ist defined by  
 
LD(x,y) = min(i)(#S(i)+#D(i)+#I(i) ) for all possible edit sessions i.
 
#S(I), #D(i), # I(i) are the number of substitutions, deletions and insertions required in the edit session i.
 
For example,
 
„h If s is "test" and t is "test", then LD(s,t) = 0, because no transformations are needed. The strings are already identical.
„h If s is "test" and t is "tent", then LD(s,t) = 1, because one substitution (change "s" to "n") is sufficient to transform s into t.
 
The greater the Levenshtein distance, the more different the strings are.
Levenshtein distance is named after the Russian scientist Vladimir Levenshtein, who devised the algorithm in 1965.
 
„h Spell checking
„h Speech recognition
„h DNA analysis
„h Plagiarism detection
 
The Damerau-Levenstein metric is a generalisation of the Levenstein metric. It assigns weights to the edit operations. It is defined as
 
DLD(x,y) = min(i)(#S(i)*Ws+#D(i)*Wd+#I(i)*Wi ) for all possible edit sessions i.
Again #S(I), #D(i), # I(i) are the number of substitutions, deletions and insertions required in the edit session i.
Ws, Wd, Wi are positive numbered weighting factors for the edit operations.
 
 
For more information search the web for [levenshtein ¡Vphp] using www.google.com. levensthtein is a function in the PHP library. The string ¡Vphp is used to exclude all the PHP manual pages from the search, that obscure the more interesting links.
 
With this metric and a quadruppel of numbers (the weights and a threshold value of likeness) you can implement a StringsAreLike function of your own taste. The below example I used to search for similar names (Meyer, Meier, Mayer, Mayr) in a database. This is usefull if you can not remeber the correct spelling of a persons name.
}
 
Implementation.
 
uses
  math, sysutils;
 
const
  ws=3;                        //  weight for substitution
  wi=1;                        //  weight for insertion
  wd=6;                        //  weight for deleting
  th=4;
 
 
function StringsAreLike(const s1,s2:string):boolean;
begin
  result:= DamerauLevenshteinLike(s1,s2,ws,wi,wd)<=th;
end;
 
function DamerauLevenshteinLike(const s1,s2:string;ws,wi,wd:integer):Integer;
 
VAR
  i,j:Integer;
 
function Pp(x,y:Integer):Integer;
begin
  if AnsiUpperCase(s1[x])=AnsiUpperCase(s2[y]) then Pp:=0 else Pp:=ws;
end;
 
var
  Wmax:integer;
  d:array of array of integer;
 
begin
  Wmax:=Max(length(s1),length(s2))+1;
  SetLength(d,Wmax,Wmax);
  dec(Wmax);
  d[0,0]:=0;
  for j:=1 TO Wmax DO d[0,j]:=d[0,Pred(j)]+wi;
  for i:=1 TO Wmax DO d[i,0]:=d[Pred(i),0]+wd;
  for i:=1 TO Length(s1) DO
    for j:=1 TO Length(s2) DO
      d[i,j]:=MinIntValue([ d[Pred(i),Pred(j)]+Pp(i,j),      //substitution
                            d[     i ,Pred(j)]+wi,            //insertion
                            d[Pred(i),     j ]+wd             //deletion
                         ]);
  result:=d[Length(s1),Length(s2)];
  SetLength(d,0);
end{DamerauLevenshteinLike};

Cherche unité pour comparer 2 strings

Composants VCL Delphi

Vue hybride

Discussions similaires

Partager

Partager