Script python - pandas - error
bonjour,
Le fichier train.csv est de 1Go et plus de 5 millions de lignes.
Je ne sais pas si c'est le manque de mémoire système ou autres
qui bloque ..
Code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| import pandas as pd
import numpy as np
df_train = pd.read_csv('train.csv')
df_test = pd.read_csv('test.csv')
print('Size of training data: ' + str(df_train.shape))
print('Size of testing data: ' + str(df_test.shape))
print('\nColumns:' + str(df_train.columns.values))
print(df_train.describe())
#print(df_train['place_id'])
print('\nNumber of place ids: ' + str(len(list(set(df_train['place_id'].values.tolist()))))) |
Traceback (most recent call last):
File "/media/msi-ubuntu/4a613636-5602-4e5a-9856-8e1aef2a7f43/Mes_documents/ing_prob/0-mooc-kaggle/facebook/face-1.py", line 4, in <module>
df_train = pd.read_csv('train.csv')
File "/home/msi-ubuntu/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 498, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/msi-ubuntu/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 285, in _read
return parser.read()
File "/home/msi-ubuntu/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 747, in read
ret = self._engine.read(nrows)
File "/home/msi-ubuntu/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1197, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:7988)
File "pandas/parser.pyx", line 816, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8661)
File "pandas/parser.pyx", line 1924, in pandas.parser._concatenate_chunks (pandas/parser.c:24468)
MemoryError
J'ai 4 Go de ram et il faudrait combien de plus avec ce script ?
@+
:mrgreen: