1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
|
import math
import re
from collections import Counter
import pandas as pd
WORD = re.compile(r"\w+")
def get_cosine(vec1, vec2):
intersection = set(vec1.keys()) & set(vec2.keys())
numerator = sum([vec1[x] * vec2[x] for x in intersection])
sum1 = sum([vec1[x] ** 2 for x in list(vec1.keys())])
sum2 = sum([vec2[x] ** 2 for x in list(vec2.keys())])
denominator = math.sqrt(sum1) * math.sqrt(sum2)
if not denominator:
return 0.0
else:
return float(numerator) / denominator
def text_to_vector(text):
words = WORD.findall(text)
return Counter(words)
#first u have to open the file and seperate every line like below:
with open(r'C:\Users\User\Desktop\rockyou.txt', "r",encoding="ISO-8859-1") as f:
lines = f.readlines()
df_result = pd.DataFrame(columns=('id', 'password'))
for i,line in enumerate(lines):
id, password = line.split()
df_result.loc[i] = [id, password]
print(df_result)
for i in df_result.index:
result = cosine(text_to_vector(df_result["id"][i]), text_to_vector(df_result["password"][i]))
print(result)
id password
0 290729 123456
1 79076 12345
2 76789 123456789
3 59462 password
4 49952 iloveyou
5 33291 princess
6 21725 1234567
7 20901 rockyou
8 20553 12345678
9 16648 abc123
10 16227 nicole
11 15308 daniel
12 15163 babygirl
13 14726 monkey
14 14331 lovely
15 14103 jessica
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_1056/1836551814.py in <module>
42
43 for i in df_result.index:
---> 44 result = float(cosine(text_to_vector(df_result["id"][i]), text_to_vector(df_result["password"][i])))
45 print(result)
TypeError: 'float' object is not callable |
Partager