hadoop streaming - python

**bordi** · 02/09/2015, 22h18

Je me fait une petite pause python/mapreduce, en attendant de reprendre spark qui est un aussi gros sujet qu'hadoop

Je me suis fait une adaptation d'un programme pyhton pour hadoop 2.6 en mode streaming
si ca intéresse quelqu'un, avoir un exemple, ca aide bien.

voila la première partie de mon TF-IDF, la seconde partie viendra peut être ce week end.

voici le contenu de mon fichier lettre1.txt qui sera copier su mon hdfs , id document, suivi des mots à compter

le Mapper WCMapper.py

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#! /usr/bin/env python


import os
import re
import sys

for enr in sys.stdin:
    enr = str(enr).translate(None,'\n')
    fields=enr.split(",")
    documentID=fields[0]
    mots=fields[1].split(" ")
    for unMot in mots:
            print >>sys.stdout,"%s\t%s\t1\t%i" % \
        (unMot,documentID,len(mots))

le reducer WCReducer.py

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#! /usr/bin/env python

import os
import re
import sys

motCourrant=''
motCourrantCpt=0
documentCourrant=''
docCourrantCpt=0
totalMotsInDoc=0;
motPrecedentCpt=0;

for enr in sys.stdin:

        enr = str(enr).translate(None,'\n')
        fields=enr.split("\t")

        if ( len(fields) < 2):
                continue

        if ( motCourrant != fields[0] or documentCourrant != fields[1]):
                if ( docCourrantCpt > 0):
                        print >>sys.stdout,"%s\t%s\t%i\t%f" % \
                        (motCourrant,documentCourrant,docCourrantCpt, docCourrantCpt/ totalMotsInDoc)
                documentCourrant = fields[1]
                docCourrantCpt=0
                motCourrantCpt +=1

        if ( motCourrant != fields[0]):
                if ( motCourrant != '' ):
                        print >>sys.stdout,"%s\t0000\t%i" % \
                        (motCourrant,motCourrantCpt)
                motCourrant = fields[0]
                motPrecedentCpt=motCourrantCpt
                motCourrantCpt=0

        docCourrantCpt +=1
        totalMotsInDoc=float(fields[3]);

print >>sys.stdout,"%s\t%s\t%i\t%f" % \
        (motCourrant,documentCourrant,docCourrantCpt, docCourrantCpt/ totalMotsInDoc)
print >>sys.stdout,"%s\t0000\t%i" % \
        (motCourrant,motPrecedentCpt)

Shell de lancement wc.sh

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


clean() {
echo -e "suppression repertoire sur hdfs  \n"
hadoop fs -rm -r tempwc
hadoop fs -rm -r output-streaming
hadoop fs -rm -r Lettres
}

load () {
echo -e "\n creation des repertoires et chargement des donnees\n"
hadoop fs -mkdir Lettres
hadoop fs -copyFromLocal lettre1.txt Lettres
}

clean
load

echo -e "\nRunning Wordcount python  hadoop 2.6 \n"

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar \
        -input Lettres -output tempwc \
        -mapper WCMapper.py -reducer WCReducer.py \
        -jobconf stream.num.map.output.key.fields=2 \
        -jobconf stream.num.reduce.output.key.fields=2 \
        -jobconf mapred.reduce.tasks=1 \
        -file WCMapper.py -file WCReducer.py

echo -e "\nTF fini affichage du contenu du dossier de esultat \t\n"
hadoop fs -cat tempwc/*

extrait sortie

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
hduser@stargate:~/Resource-Bundle/jbepy$ ./wc.sh
suppression repertoire sur hdfs

15/09/02 22:40:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
rm: `tempwc': No such file or directory
15/09/02 22:40:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
rm: `output-streaming': No such file or directory
15/09/02 22:40:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/02 22:40:51 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted Lettres

 creation des repertoires et chargement des donnees

15/09/02 22:40:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/02 22:40:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Running Wordcount python  hadoop 2.6

15/09/02 22:40:56 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
15/09/02 22:40:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/02 22:40:56 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead.
15/09/02 22:40:56 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
packageJobJar: [WCMapper.py, WCReducer.py, /tmp/hadoop-unjar5220630186872127756/] [] /tmp/streamjob529516712459132480.jar tmpDir=null
15/09/02 22:40:57 INFO client.RMProxy: Connecting to ResourceManager at stargate/192.168.0.11:8032
15/09/02 22:40:57 INFO client.RMProxy: Connecting to ResourceManager at stargate/192.168.0.11:8032
15/09/02 22:40:58 INFO mapred.FileInputFormat: Total input paths to process : 1
15/09/02 22:40:58 INFO mapreduce.JobSubmitter: number of splits:2
15/09/02 22:40:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1439891988258_0041
15/09/02 22:40:58 INFO impl.YarnClientImpl: Submitted application application_1439891988258_0041
15/09/02 22:40:58 INFO mapreduce.Job: The url to track the job: http://stargate:8088/proxy/applicati...91988258_0041/
15/09/02 22:40:58 INFO mapreduce.Job: Running job: job_1439891988258_0041
15/09/02 22:41:03 INFO mapreduce.Job: Job job_1439891988258_0041 running in uber mode : false
15/09/02 22:41:03 INFO mapreduce.Job:  map 0% reduce 0%
15/09/02 22:41:09 INFO mapreduce.Job:  map 100% reduce 0%
15/09/02 22:41:22 INFO mapreduce.Job:  map 100% reduce 100%
15/09/02 22:41:22 INFO mapreduce.Job: Job job_1439891988258_0041 completed successfully
15/09/02 22:41:22 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=913
                FILE: Number of bytes written=337096
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=882
                HDFS: Number of bytes written=1871
                HDFS: Number of read operations=9
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=2
                Launched reduce tasks=1
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=6714
                Total time spent by all reduces in occupied slots (ms)=21450
                Total time spent by all map tasks (ms)=6714
                Total time spent by all reduce tasks (ms)=10725
                Total vcore-seconds taken by all map tasks=6714
                Total vcore-seconds taken by all reduce tasks=10725
                Total megabyte-seconds taken by all map tasks=10312704
                Total megabyte-seconds taken by all reduce tasks=32947200
        Map-Reduce Framework
                Map input records=15
                Map output records=62
                Map output bytes=783
                Map output materialized bytes=919
                Input split bytes=216
                Combine input records=0
                Combine output records=0
                Reduce input groups=62
                Reduce shuffle bytes=919
                Reduce input records=62
                Reduce output records=113
                Spilled Records=124
                Shuffled Maps =2
                Failed Shuffles=0
                Merged Map outputs=2
                GC time elapsed (ms)=143
                CPU time spent (ms)=4900
                Physical memory (bytes) snapshot=1906573312
                Virtual memory (bytes) snapshot=7287844864
                Total committed heap usage (bytes)=2355101696
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=666
        File Output Format Counters
                Bytes Written=1871
15/09/02 22:41:22 INFO streaming.StreamJob: Output directory: tempwc

TF fini affichage du contenu du dossier de esultat

15/09/02 22:41:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
A       H1      1       0.200000
A       H2      1       0.200000
A       0000    2
An      H1      1       0.250000
An      H3      1       0.200000
An      0000    2
And     H2      1       0.200000
And     0000    1
Are     H2      1       0.500000
Are     0000    1
Because H3      1       1.000000
Because 0000    1
But     H2      1       0.166667
But     0000    1
Elephant        H1      1       0.200000
Elephant        0000    1
HDFS    H2      1       0.200000
HDFS    0000    1
Hadoop  H1      1       0.200000
Hadoop  H3      1       0.200000
Hadoop  0000    2
Hadoop. H2      1       0.200000
Hadoop. 0000    1
He      H1      1       0.333333
He      H3      1       0.250000
He      0000    2
Impala  H2      1       0.500000
Impala  0000    1
King!   H1      1       0.200000
King!   0000    1
Or      H3      1       0.250000
Or      0000    1
Sqoop.  H2      1       0.166667
Sqoop.  0000    1
The     H2      1       0.166667
The     0000    1
Useful  H1      1       0.500000
Useful  0000    1
an      H3      1       0.200000
an      0000    1
and     H1      1       0.200000
and     H3      1       0.200000
and     0000    2
anything        H3      1       0.250000
anything        0000    1
bad     H3      1       0.250000
bad     0000    1
cling!  H1      1       0.250000
cling!  0000    1
data    H1      1       0.500000
data    0000    1
does    H3      1       0.250000
does    0000    1
elegant H1      1       0.200000
elegant H3      1       0.200000
elegant 0000    2
element H1      1       0.250000
element 0000    1
elephant        H2      1       0.166667
elephant        H3      1       0.200000
elephant        0000    2
extraneous      H1      1       0.250000
extraneous      0000    1
fellow. H3      1       0.200000
fellow. 0000    1
forgets H1      1       0.333333
forgets 0000    1
gentle  H3      1       0.200000
gentle  0000    1
gets    H3      1       0.250000
gets    0000    1
group.  H2      1       0.200000
group.  0000    1
helps   H2      1       0.166667
helps   0000    1
him     H2      1       0.166667
him     0000    1
in      H2      1       0.200000
in      0000    1
is      H1      1       0.200000
is      H2      1       0.200000
is      H3      1       0.200000
is      0000    3
king    H2      1       0.200000
king    0000    1
mad     H3      1       0.250000
mad     0000    1
mellow. H3      1       0.200000
mellow. 0000    1
never   H1      1       0.333333
never   H3      1       0.250000
never   0000    2
plays   H2      1       0.166667
plays   0000    1
the     H1      1       0.200000
the     H2      1       0.200000
the     0000    2
thing.  H1      1       0.200000
thing.  0000    1
thrive  H2      1       0.166667
thrive  0000    1
to      H2      1       0.166667
to      0000    1
well    H2      1       0.166667
well    0000    1
what    H2      1       0.166667
what    0000    1
with    H2      1       0.166667
with    0000    1
wonderful       H2      1       0.200000
wonderful       0000    1
yellow  H1      1       0.200000
yellow  0000    1

**bordi** · 03/09/2015, 19h27

Voila la deuxieme partie du TF-IDF, la première étant le TF (Texte Frequency) pour compter la frequences mots, le fameux wordcount avec wc.sh

on va maintenant calculer le poids des mots dans les documents en privilegiant les moins fréquents en terme d'importance, IDF (inverse document frequency)

le mapper IdMapper.py, tenir compte de la log des résultats de sortie du post précédent

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
! /usr/bin/env python

import os
import re
import sys

for enr in sys.stdin:
    print( str(enr).translate(None,'\n'))

le reducer

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#! /usr/bin/env python


import os
import re
import sys
import math

totalDocuments=3
motCourantDansDocuments=0;
idf=0;

for enr in sys.stdin:

    fields = str(enr).translate(None,'\n').split("\t")

    if ( len(fields) < 2):
        continue

    if ( fields[1] == "0000"):
        motCourantDansDocuments=int(fields[2])
        idf= math.log1p( totalDocuments / motCourantDansDocuments )
        continue;

    if ( motCourantDansDocuments > 0) :
         print >>sys.stdout,"%s\t%s\t%i\t%f\t%f" % \
             (fields[0],fields[1], int(fields[2]), float(fields[3]), \
                float(fields[3]) * idf )

le script tfidf.sh - note il faut exécuter préalablement le wc.sh pour alimenter les données en input

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
echo -e "\nTF (wordcount) affichage resultat - apres avoir executer wc.sh \n"
hadoop fs -cat tempwc/*

hadoop fs -rm -r tempidf

echo -e "\n TF-IDF \n"

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar \
        -input tempwc -output tempidf \
        -mapper IdMapper.py -reducer IDFReducer.py \
        -jobconf stream.num.map.output.key.fields=2 \
        -jobconf stream.num.reduce.output.key.fields=2 \
        -jobconf mapred.reduce.tasks=1 \
        -file IdMapper.py  -file IDFReducer.py

echo -e "\n affichage resultat TF-IDF. \n"
hadoop fs -cat tempidf/*

extraction logs

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67

 affichage resultat TF-IDF.

15/09/03 19:16:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
A       H1      1       0.200000        0.138629
A       H2      1       0.200000        0.138629
An      H1      1       0.250000        0.173287
An      H3      1       0.200000        0.138629
And     H2      1       0.200000        0.277259
Are     H2      1       0.500000        0.693147
Because H3      1       1.000000        1.386294
But     H2      1       0.166667        0.231050
Elephant        H1      1       0.200000        0.277259
HDFS    H2      1       0.200000        0.277259
Hadoop  H1      1       0.200000        0.138629
Hadoop  H3      1       0.200000        0.138629
Hadoop. H2      1       0.200000        0.277259
He      H1      1       0.333333        0.231049
He      H3      1       0.250000        0.173287
Impala  H2      1       0.500000        0.693147
King!   H1      1       0.200000        0.277259
Or      H3      1       0.250000        0.346574
Sqoop.  H2      1       0.166667        0.231050
The     H2      1       0.166667        0.231050
Useful  H1      1       0.500000        0.693147
an      H3      1       0.200000        0.277259
and     H1      1       0.200000        0.138629
and     H3      1       0.200000        0.138629
anything        H3      1       0.250000        0.346574
bad     H3      1       0.250000        0.346574
cling!  H1      1       0.250000        0.346574
data    H1      1       0.500000        0.693147
does    H3      1       0.250000        0.346574
elegant H1      1       0.200000        0.138629
elegant H3      1       0.200000        0.138629
element H1      1       0.250000        0.346574
elephant        H2      1       0.166667        0.115525
elephant        H3      1       0.200000        0.138629
extraneous      H1      1       0.250000        0.346574
fellow. H3      1       0.200000        0.277259
forgets H1      1       0.333333        0.462098
gentle  H3      1       0.200000        0.277259
gets    H3      1       0.250000        0.346574
group.  H2      1       0.200000        0.277259
helps   H2      1       0.166667        0.231050
him     H2      1       0.166667        0.231050
in      H2      1       0.200000        0.277259
is      H1      1       0.200000        0.138629
is      H2      1       0.200000        0.138629
is      H3      1       0.200000        0.138629
king    H2      1       0.200000        0.277259
mad     H3      1       0.250000        0.346574
mellow. H3      1       0.200000        0.277259
never   H1      1       0.333333        0.231049
never   H3      1       0.250000        0.173287
plays   H2      1       0.166667        0.231050
the     H1      1       0.200000        0.138629
the     H2      1       0.200000        0.138629
thing.  H1      1       0.200000        0.277259
thrive  H2      1       0.166667        0.231050
to      H2      1       0.166667        0.231050
well    H2      1       0.166667        0.231050
what    H2      1       0.166667        0.231050
with    H2      1       0.166667        0.231050
wonderful       H2      1       0.200000        0.277259
yellow  H1      1       0.200000        0.277259

**bordi** · 04/09/2015, 20h35

un petit dernier pour la route, tout simple pour finir le hadoop streaming en python

un total par ville pour les prestations accessoires+lavage

venteservices.Csv

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
33000,Bordeaux,Reparation,1300
75000,Paris,accessoires,170
13000,Marseille,accessoires,400
13000,Marseille,lavage,75
33000,Bordeaux,accessoires,700
13000,Marseille,Reparation,1300
33000,Bordeaux,Reparation,540
13000,Marseille,accessoires,1200
75000,Paris,lavage,75
13000,Marseille,Reparation,600
33000,Bordeaux,accessoires,700
13000,Marseille,Reparation,500

le mapper FiltreMapper.py

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#! /usr/bin/env python

import os
import re
import sys

for line in sys.stdin:
        fields = str(line).translate(None,'\n').split(",")
        cp=fields[0]
        ville=fields[1]
        service=fields[2]
        montant=float(fields[3])
        if( service == 'accessoires' ):
                print >> sys.stdout, "%s\t%s\t%6.2f"  %  (ville, cp, montant)
        if( service == 'lavage' ):
                print >> sys.stdout, "%s\t%s\t%6.2f"  %  (ville, cp, montant)

le reducer StatReducer.py

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#! /usr/bin/env python

import os
import re
import sys

ville=""
total=0.0

for line in sys.stdin:
        fields= str(line).translate(None,'\n').split("\t")
        if ( len(fields) > 1 ):
                if ( fields[0] != ville ):
                        if ( ville != ""):
                                print >> sys.stdout, "%s\t%6.2f"  %  (ville, total)
                        ville=fields[0]
                        total=0.0
                total+= float(fields[2])
if ( ville != ""):
        print >> sys.stdout, "%s\t%6.2f"  %  (ville, total)

script shell, rapportvente.sh

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
hadoop fs -rm -r tempstat
hadoop fs -rm -r rapports

echo -e "\n initialisation \n"
hadoop fs -mkdir rapports
hadoop fs -copyFromLocal venteservices.csv rapports

echo -e "\n start Ventes Map Reduce\n"

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar \
        -input rapports -output tempstat \
        -mapper FiltreMapper.py -reducer StatReducer.py \
        -jobconf stream.num.map.output.key.fields=2 \
        -jobconf stream.num.reduce.output.key.fields=2 \
        -jobconf mapred.reduce.tasks=1 \
        -file FiltreMapper.py -file StatReducer.py

echo -e "\nresultats \n"
hadoop fs -cat tempstat/*

logs resultat

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
 ./rapportvente.sh
15/09/04 20:34:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/04 20:34:50 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted tempstat
15/09/04 20:34:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/04 20:34:51 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted rapports

Creating input directory and copying input data

15/09/04 20:34:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/04 20:34:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

 start Ventes Map Reduce

15/09/04 20:34:56 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
15/09/04 20:34:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/04 20:34:56 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead.
15/09/04 20:34:56 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
packageJobJar: [FiltreMapper.py, StatReducer.py, /tmp/hadoop-unjar2124791153853851193/] [] /tmp/streamjob6319431278732850672.jar tmpDir=null
15/09/04 20:34:57 INFO client.RMProxy: Connecting to ResourceManager at stargate/192.168.0.11:8032
15/09/04 20:34:57 INFO client.RMProxy: Connecting to ResourceManager at stargate/192.168.0.11:8032
15/09/04 20:34:58 INFO mapred.FileInputFormat: Total input paths to process : 1
15/09/04 20:34:58 INFO mapreduce.JobSubmitter: number of splits:2
15/09/04 20:34:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1439891988258_0091
15/09/04 20:34:58 INFO impl.YarnClientImpl: Submitted application application_1439891988258_0091
15/09/04 20:34:58 INFO mapreduce.Job: The url to track the job: http://stargate:8088/proxy/application_1439891988258_0091/
15/09/04 20:34:58 INFO mapreduce.Job: Running job: job_1439891988258_0091
15/09/04 20:35:03 INFO mapreduce.Job: Job job_1439891988258_0091 running in uber mode : false
15/09/04 20:35:03 INFO mapreduce.Job:  map 0% reduce 0%
15/09/04 20:35:08 INFO mapreduce.Job:  map 100% reduce 0%
15/09/04 20:35:14 INFO mapreduce.Job:  map 100% reduce 100%
15/09/04 20:35:14 INFO mapreduce.Job: Job job_1439891988258_0091 completed successfully
15/09/04 20:35:15 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=172
                FILE: Number of bytes written=335713
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=767
                HDFS: Number of bytes written=51
                HDFS: Number of read operations=9
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=2
                Launched reduce tasks=1
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=6390
                Total time spent by all reduces in occupied slots (ms)=5556
                Total time spent by all map tasks (ms)=6390
                Total time spent by all reduce tasks (ms)=2778
                Total vcore-seconds taken by all map tasks=6390
                Total vcore-seconds taken by all reduce tasks=2778
                Total megabyte-seconds taken by all map tasks=9815040
                Total megabyte-seconds taken by all reduce tasks=8534016
        Map-Reduce Framework
                Map input records=12
                Map output records=7
                Map output bytes=152
                Map output materialized bytes=178
                Input split bytes=230
                Combine input records=0
                Combine output records=0
                Reduce input groups=3
                Reduce shuffle bytes=178
                Reduce input records=7
                Reduce output records=3
                Spilled Records=14
                Shuffled Maps =2
                Failed Shuffles=0
                Merged Map outputs=2
                GC time elapsed (ms)=97
                CPU time spent (ms)=2680
                Physical memory (bytes) snapshot=1932500992
                Virtual memory (bytes) snapshot=7280640000
                Total committed heap usage (bytes)=2460483584
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=537
        File Output Format Counters
                Bytes Written=51
15/09/04 20:35:15 INFO streaming.StreamJob: Output directory: tempstat

resultats

15/09/04 20:35:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Bordeaux        1400.00
Marseille       1675.00
Paris   245.00

hadoop streaming - python

Hadoop & co

Discussions similaires

Partager

Partager