IdentifiantMot de passe
Loading...
Mot de passe oublié ?Je m'inscris ! (gratuit)
Navigation

Inscrivez-vous gratuitement
pour pouvoir participer, suivre les réponses en temps réel, voter pour les messages, poser vos propres questions et recevoir la newsletter

Hadoop & co Discussion :

Hadoop ecosysteme - Hive - hbase - Pig - Map reduce


Sujet :

Hadoop & co

  1. #1
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut Hadoop ecosysteme - Hive - hbase - Pig - Map reduce
    Bonjour,

    me revoila

    Je suis en train de regarder Hive 0.14, pour l'instant j'ai pas trop de problème avec le dml, depuis la 0.14 on peut utiliser un format étendu appeler ORC,
    pour faire des update, delete, et des insert value, a chaque action, le map reduce se déclenche pour la faire les mises à jours sur les noeuds.

    Avant on ne pouvait qu'alimenter la table et faire des queries dessus, il y avait quelques astuces mais c'était assez limité
    J'ai utilisé aussi hive pour mapper et creer des tables hbase à partir de table temporaire qui sont alimentés par des fichier csv,

    Mais je m'interroge sur les comportements de la décomposition des données au niveau partition et/ou des buckets ?

    Les grandes tables hive peuvent être découpée en partion ok, hive va organiser en une hierarchie de répertoire les données

    create table agence {
    nom String,
    ville String,
    telephone String,
    code_departement String
    } PARTITIONED BY (region String)

    en l'occurence un répertoire sera créer par region pour les agences

    create table agence {
    nom String,
    ville String,
    telephone,
    region String,
    code_departement String
    }
    clustered by (code_departement) into 3 buckets;

    on peut avoir plusieurs departements dans un fichier bucket, mais toutes les agences d'un code-department appartiendront à un seul bucket

    Mais je vois mal l'interaction des deux ensembles, des infos ?

    Est ce que si on a trop de partition, les buckets donne la possibilité les découper les données plus rapidement via l'execution d'un nombre plus important de map reduce
    en parallele

    edit! après quelques tests

    create table agence (
    nom String,
    ville String,
    telephone String,
    code_departement String
    )
    PARTITIONED BY (region String)
    CLUSTERED BY (code_departement) INTO 3 BUCKETS
    STORED AS ORC tblproperties ("orc.compress"="NONE");


    INSERT INTO TABLE agence PARTITION (region='Aquitaine') VALUES ('TOTAL', 'BORDEAUX', '05.01.01.01.02','33'), ('ESSO','TOULOUSE','09.09.09.09', '31') ;
    INSERT INTO TABLE agence PARTITION (region='IDF') VALUES ('ELF','PARIS','09.88.83.33.45','75');
    INSERT INTO TABLE agence PARTITION (region='PACA') VALUES ('BP','PARIS','06.32.54.31.53','13');


    je pensai qu'il allait faire la création de trois partitions pour Aquitaine/ IDF/PACA, puis dans chacune de ces partitions , faire la gestion des 3 buckets selon le departement
    mais curieusement il met tout dans la même partition, il y a un truc qui m'échappe dans le mécanisme.

    Une idée ? bon je vais creuser avec quelques tests en relisant la doc, j'ai du rater un truc.

    edit: bon j'étais pas au bon endroit, c'est bien ce qu'il fait, ils respectent la hiérarchie, des fois on ne voit pas ce qu'on a devant les yeux

    / user/ hive/ warehouse/ jbedb.db/ agence

    mes 3 partitions, avec chacun 3 buckets contenant bien mes données, bizarre le nom de la partition contient l'intégralité de l'expression que j'ai mis dans la partition( ), la syntaxe est pourtant correcte,
    louche

    region=Aquitaine
    region=IDF
    region=PACA

    JP

  2. #2
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    j'ai fait quelques tests avec les map et les arays de hive, j'ai tatoné un peu
    mais c'est intéressant

    voici l'echantillon de mon fichier test map.csv. le '-' separe les collections, le ':' separe la cle:valeur, la ',' separe les champs

    Garfield-Odie,001:pizzat-000:Lasagne
    Mermal,002:leplusmignon
    Liz,003:veto
    Squeak,004:Souris


    les séparateurs sont importants pour charger les données du fichier dans les bons champs, j'ai mis un array en premier champ, une map en second champ.

    CREATE TABLE test_map( monarray array<string>, mymap map<int,string> )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '-' MAP KEYS TERMINATED BY ':' LINES TERMINATED BY '\n';

    LOAD DATA LOCAL inpath '/tmp/map.csv' INTO TABLE test_map;

    0: jdbc:hive2://stargate:10000/jbedb> select * from test_map;
    +----------------------+---------------------------+--+
    | test_map.monarray | test_map.mymap |
    +----------------------+---------------------------+--+
    | ["Garfield","Odie"] | {1:"pizzat",0:"Lasagne"} |
    | ["Mermal"] | {2:"leplusmignon"} |
    | ["Liz"] | {3:"veto"} |
    | ["Squeak"] | {4:"Souris"} |
    +----------------------+---------------------------+--+
    4 rows selected (0,4 seconds)


    describe test_map;
    +-----------+------------------+----------+--+
    | col_name | data_type | comment |
    +-----------+------------------+----------+--+
    | monarray | array<string> | |
    | mymap | map<int,string> | |
    +-----------+------------------+----------+--+
    2 rows selected (0,689 seconds)


    je cherche à savoir si c'est possible de faire avec des structures, ca serait top. Affaire à suivre.

  3. #3
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    Après quelques recherches, j'ai fini par trouver les structures;

    j'ai donc initialisé un fichier csv de la facon suivante, en respectant les mecanisme des sepérateur

    Paul Dufilo,47000,Mr E,Ass.Maladie:1400-Ass.Vieillesse:1900,lesueur-Tours-France-:41000
    Jacques Dupond,31000,,Ass.Maladie:1100-Ass.Vieillesse:1300,deshommes-Paris-France-75000
    Marcel Martin,19000,,Ass.Maladie:600-Ass.Vieillesse:800,delasomme-Bordeaux-France-33000

    creation d'une table simple,array,map, struct

    CREATE TABLE salaries (
    nom STRING, salaire FLOAT,
    subordonnes ARRAY<STRING>,
    cotisations MAP<STRING, FLOAT>,
    adresse STRUCT<rue:STRING, ville:STRING, pays:STRING, cp:INT>
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    COLLECTION ITEMS TERMINATED BY '-'
    MAP KEYS TERMINATED BY ':' LINES TERMINATED BY '\n'
    STORED AS TEXTFILE;

    load data local inpath '/tmp/employees.csv' OVERWRITE into table salaries;


    0: jdbc:hive2://stargate:10000/jbedb> select * from salaries;
    +-----------------+-------------------+-----------------------+-------------------------------------------------+--------------------------------------------------------------------+--+
    | salaries.nom | salaries.salaire | salaries.subordonnes | salaries.cotisations | salaries.adresse |
    +-----------------+-------------------+-----------------------+-------------------------------------------------+--------------------------------------------------------------------+--+
    | Paul Dufilo | 47000.0 | ["Mr E"] | {"Ass.Maladie":1400.0,"Ass.Vieillesse":1900.0} | {"rue":"lesueur","ville":"Tours","pays":"France","cp":null} |
    | Jacques Dupond | 31000.0 | [] | {"Ass.Maladie":1100.0,"Ass.Vieillesse":1300.0} | {"rue":"deshommes","ville":"Paris","pays":"France","cp":75000} |
    | Marcel Martin | 19000.0 | [] | {"Ass.Maladie":600.0,"Ass.Vieillesse":800.0} | {"rue":"delasomme","ville":"Bordeaux","pays":"France","cp":33000} |
    +-----------------+-------------------+-----------------------+-------------------------------------------------+--------------------------------------------------------------------+--+


    traitement de calcul sur des éléments de la map

    0: jdbc:hive2://stargate:10000/jbedb>
    0: jdbc:hive2://stargate:10000/jbedb> select nom,round( salaire-(cotisations['Ass.Maladie']+cotisations['Ass.Vieillesse'])) from salaries;
    I
    NFO : Number of reduce tasks is set to 0 since there's no reduce operator
    WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    INFO : number of splits:1
    INFO : Submitting tokens for job: job_1433351652657_0011
    INFO : The url to track the job: http://stargate:8088/proxy/applicati...51652657_0011/
    INFO : Starting Job = job_1433351652657_0011, Tracking URL = http://stargate:8088/proxy/applicati...51652657_0011/
    INFO : Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1433351652657_0011
    INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    INFO : 2015-06-03 22:58:42,790 Stage-1 map = 0%, reduce = 0%
    INFO : 2015-06-03 22:58:47,953 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.7 sec
    INFO : MapReduce Total cumulative CPU time: 1 seconds 700 msec
    INFO : Ended Job = job_1433351652657_0011
    +-----------------+----------+--+
    | nom | _c1 |
    +-----------------+----------+--+
    | Paul Dufilo | 43700.0 |
    | Jacques Dupond | 28600.0 |
    | Marcel Martin | 17600.0 |
    +-----------------+----------+--+

    test case en passant sur valeur salaire

    0: jdbc:hive2://stargate:10000/jbedb> select nom,round( (salaire-(cotisations['Ass.Maladie']+ cotisations['Ass.Vieillesse'])) ) ,
    . . . . . . . . . . . . . . . . . . > case
    . . . . . . . . . . . . . . . . . . > when salaire < 20000.0 THEN 'petit'
    . . . . . . . . . . . . . . . . . . > when salaire >= 20000.0 AND salaire<40000.0 THEN 'moyen'
    . . . . . . . . . . . . . . . . . . > when salaire >= 40000.0 THEN 'mieux'
    . . . . . . . . . . . . . . . . . . > else 'secret'
    . . . . . . . . . . . . . . . . . . > END AS categorie FROM salaries;
    INFO : Number of reduce tasks is set to 0 since there's no reduce operator
    WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    INFO : number of splits:1
    INFO : Submitting tokens for job: job_1433351652657_0014
    INFO : The url to track the job: http://stargate:8088/proxy/applicati...51652657_0014/
    INFO : Starting Job = job_1433351652657_0014, Tracking URL = http://stargate:8088/proxy/applicati...51652657_0014/
    INFO : Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1433351652657_0014
    INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    INFO : 2015-06-03 23:20:30,916 Stage-1 map = 0%, reduce = 0%
    INFO : 2015-06-03 23:20:37,087 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.05 sec
    INFO : MapReduce Total cumulative CPU time: 2 seconds 50 msec
    INFO : Ended Job = job_1433351652657_0014
    +-----------------+----------+------------+--+
    | nom | _c1 | categorie |
    +-----------------+----------+------------+--+
    | Paul Dufilo | 43700.0 | mieux |
    | Jacques Dupond | 28600.0 | moyen |
    | Marcel Martin | 17600.0 | petit |
    +-----------------+----------+------------+--+
    3 rows selected (12,482 seconds)


    traitement d'une query sur un champ structure

    0: jdbc:hive2://stargate:10000/jbedb> select nom, adresse.ville from salaries;
    INFO : Number of reduce tasks is set to 0 since there's no reduce operator
    WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    INFO : number of splits:1
    INFO : Submitting tokens for job: job_1433351652657_0019
    INFO : The url to track the job: http://stargate:8088/proxy/applicati...51652657_0019/
    INFO : Starting Job = job_1433351652657_0019, Tracking URL = http://stargate:8088/proxy/applicati...51652657_0019/
    INFO : Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1433351652657_0019
    INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    INFO : 2015-06-03 23:47:41,156 Stage-1 map = 0%, reduce = 0%
    INFO : 2015-06-03 23:47:54,817 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.97 sec
    +-----------------+-----------+--+
    | nom | ville |
    +-----------------+-----------+--+
    | Paul Dufilo | Tours |
    | Jacques Dupond | Paris |
    | Marcel Martin | Bordeaux |
    +-----------------+-----------+--+
    3 rows selected (30,401 seconds)
    INFO : MapReduce Total cumulative CPU time: 3 seconds 970 msec
    INFO : Ended Job = job_1433351652657_0019


    j'ai pas encore trouvé, comment faire la somme des lignes cotisations de la map pour un salarié. affaire à suivre

    il faut que je regarde comment on peut calculer un resultat dans une variable temporaire si cela est possible
    en hive.

    a noter, il n y aucun controle d'injection dans la table, on peut mettre n'importe quoi ou avoir des champs
    décalé de l'un par rapport à l'autre, genre le salaire qui se retrouve dans le champ pays.
    pas cool.

  4. #4
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    J'ai regardé une capacité intéressante de hive, les UDF (user definition function), on peut lui ajouter nos propres macro très facilement.

    j'ai utilisé l'exemple, mais il n'était pas très explicite, j'ai trouvé un vieux script qui n'était plus adapter et que j'ai remis au gout du jour
    dans le contexte de mon hadoop 2.6.0

    https://cwiki.apache.org/confluence/...ve/HivePlugins

    la classe macro qu'on souhaite ajouter

    hduser@stargate:~/hive/function$ cat /home/hduser/udf/com/example/hive/udf/Lower.java

    package com.example.hive.udf;

    import org.apache.hadoop.hive.ql.exec.UDF;
    import org.apache.hadoop.io.Text;

    public final class Lower extends UDF {
    public Text evaluate(final Text s) {
    if (s == null) { return null; }
    return new Text(s.toString().toLowerCase());
    }
    }
    Le script que j'ai bricolé pour hadoop 2.6.0, il va faire le classpath en récuperant les jar hadoop/hive, puis compiler la classe UDF et faire le jar et pour finir il va afficher l'usage pour un test
    sous l'interpreteur beeline.


    compile.sh

    #!/bin/bash
    set -x
    if [ "$1" == "" ]; then
    echo "Usage: $0 <java file>"
    exit 1
    fi

    CNAME=${1%.java}
    JARNAME=$CNAME.jar
    JARDIR=/tmp/hive_jars/$CNAME
    CLASSPATH=$(ls $HIVE_HOME/lib/hive-serde-*.jar):$(ls $HIVE_HOME/lib/hive-exec-*.jar):$(ls $HADOOP_HOME/share/hadoop/common/hadoop-common-?.?.?.jar)

    function tell {
    echo
    echo "$1 successfully compiled. In Hive run:"
    echo "$> add jar $JARNAME;"
    echo "$> create temporary function $CNAME as 'com.example.hive.udf.$CNAME';"
    echo
    }

    mkdir -p $JARDIR
    javac -classpath $CLASSPATH -d $JARDIR/ $1 && jar -cf $JARNAME -C $JARDIR/ . && tell $1
    ~
    ca donne le résultat ci dessous


    hduser@stargate:~/hive/function$ ./compile.sh ~/udf/com/example/hive/udf/Lower.java

    /home/hduser/udf/com/example/hive/udf/Lower.java successfully compiled. In Hive run:
    $> add jar /home/hduser/udf/com/example/hive/udf/Lower.jar;
    $> create temporary function /home/hduser/udf/com/example/hive/udf/Lower as 'com.example.hive.udf./home/hduser/udf/com/example/hive/udf/Lower';


    sous l'interpreteur beeline

    ajout de la resource
    0: jdbc:hive2://stargate:10000/jbedb> add jar /home/hduser/udf/com/example/hive/udf/Lower.jar;
    INFO : Added [/home/hduser/udf/com/example/hive/udf/Lower.jar] to class path
    INFO : Added resources: [/home/hduser/udf/com/example/hive/udf/Lower.jar]
    No rows affected (0,017 seconds)

    cretion de la function durant la session

    0: jdbc:hive2://stargate:10000/jbedb> create temporary function my_lower as 'com.example.hive.udf.Lower';
    No rows affected (0,013 seconds)

    execution de la macro sur une query

    0: jdbc:hive2://stargate:10000/jbedb> select my_lower(nom) from salaries;
    INFO : Number of reduce tasks is set to 0 since there's no reduce operator
    WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    INFO : number of splits:1
    INFO : Submitting tokens for job: job_1433437271446_0007
    INFO : The url to track the job: http://stargate:8088/proxy/applicati...37271446_0007/
    INFO : Starting Job = job_1433437271446_0007, Tracking URL = http://stargate:8088/proxy/applicati...37271446_0007/
    INFO : Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1433437271446_0007
    INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    INFO : 2015-06-04 21:42:55,902 Stage-1 map = 0%, reduce = 0%
    INFO : 2015-06-04 21:43:10,586 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.86 sec
    INFO : MapReduce Total cumulative CPU time: 4 seconds 860 msec
    INFO : Ended Job = job_1433437271446_0007
    +-----------------+--+
    | _c0 |
    +-----------------+--+
    | paul dufilo |
    | jacques dupond |
    | marcel martin |
    +-----------------+--+
    3 rows selected (32,719 seconds)
    0: jdbc:hive2://stargate:10000/jbedb> select nom from salaries;
    +-----------------+--+
    | nom |
    +-----------------+--+
    | Paul Dufilo |
    | Jacques Dupond |
    | Marcel Martin |
    +-----------------+--+

    avec l'exécution de la macro les majuscules sont convertis en minuscules, cet exemple ouvre beaucoup de perspective dans hive, qui est très ouvert.
    j'aime bien. les insert en streaming ne sont pas mal non plus. je me vais chercher un exemple concret.

    affaire à suivre.

  5. #5
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    Hive Serde est un mécanisme de sérialisation désérialisation qui permet à hive d'importer des données dans sa base
    sans le supporter directement, hive fourni un interface SerDe qui doit être implémenter par un utilisateur.

    Exemple avec un fichier csv

    http://ogrodnek.github.io/csv-serde/

    hduser@stargate:~$ cat /tmp/csv.txt
    aaa,bbb
    ccc,tddd
    eee,fff

    sous l'interpreteur beeline

    gestion resource de l'implementation
    add jar /tmp/csv-serde-1.1.2.jar;
    create table my_table(a string, b string) row format serde 'com.bizo.hive.serde.csv.CSVSerde' with serdeproperties ( "separatorChar" = ",", "quoteChar" = "'", "escapeChar" = "\\" ) stored as textfile;
    load data local inpath "/tmp/csv.txt" into table my_table_csv;
    No rows affected (0,574 seconds)


    0: jdbc:hive2://stargate:10000/jbedb> select * from my_table_csv; +-----------------+-----------------+--+
    | my_table_csv.a | my_table_csv.b |
    +-----------------+-----------------+--+
    | aaa | bbb |
    | ccc | tddd |
    | eee | fff |
    +-----------------+-----------------+--+

    meme chose avec json

    https://code.google.com/p/hive-json-...GettingStarted

    hduser@stargate:~$ cat /tmp/json.txt
    {"field1":"data1","field2":100,"field3":"more data1","field4":123.001}
    {"field1":"data2","field2":200,"field3":"more data2","field4":123.002}
    {"field1":"data3","field2":300,"field3":"more data3","field4":123.003}
    {"field1":"data4","field2":400,"field3":"more data4","field4":123.004}

    dans beeline, on ajoute la ressource

    0: jdbc:hive2://stargate:10000/jbedb>
    add jar /tmp/hive-json-serde-0.2.jar
    INFO : Added [/tmp/hive-json-serde-0.2.jar] to class path
    INFO : Added resources: [/tmp/hive-json-serde-0.2.jar]
    No rows affected (0,06 seconds)

    0: jdbc:hive2://stargate:10000/jbedb>

    CREATE EXTERNAL TABLE IF NOT EXISTS my_table (field1 string, field2 int, field3 string, field4 double )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde';

    0: jdbc:hive2://stargate:10000/jbedb> load data local inpath "/tmp/json.txt" into table my_table;
    INFO : Loading data to table jbedb.my_table from file:/tmp/json.txt
    INFO : Table jbedb.my_table stats: [numFiles=0, totalSize=0]
    0: jdbc:hive2://stargate:10000/jbedb> select * from my_table;
    +------------------+------------------+------------------+------------------+--+
    | my_table.field1 | my_table.field2 | my_table.field3 | my_table.field4 |
    +------------------+------------------+------------------+------------------+--+
    | data1 | 100 | more data1 | 123.001 |
    | data2 | 200 | more data2 | 123.002 |
    | data3 | 300 | more data3 | 123.003 |
    | data4 | 400 | more data4 | 123.004 |
    +------------------+------------------+------------------+------------------+--+
    4 rows selected (0,595 seconds)


    On peut implémeter les méthodes initialize(), serialize(), deserialize() de l'interface Serde soit même pour avoir un autre format, je trouve cela pas mal


    How-to: Use a SerDe in Apache Hive

    http://blog.cloudera.com/blog/2012/1...n-apache-hive/

    c'était mon dernier chapitre, j'en sais suffisament pour l'instant, je passe maintenant à pig & hive &hbase.

    affaire à suivre

  6. #6
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    bon pig et hive,

    Pour résumer on a en gros, quelques mots cle principaux ( LOAD, DUMP, FILTER, GROUP, JOIN, FOR EACH,STORE)
    après on quelques function supplémentaire ( cross, split, ..)



    pas si simple, j'ai rencontré plusieurs problème lié à la configuration.

    pig doit utiliser le metastore, mais pour ca il faut que metastore accepte de causer avec pig,

    j'ai été de revoir la conf du metastore pour enfin la possitilité d'uliser les tables hive, ca peut servir.


    <property>
    <name>hive.metastore.uris</name>
    <value>thrift://stargate:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
    </property>

    Après il faut valoriser ses variables d'environnement dans le .bashrc, et on oublie pas le pig adapter

    export PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/hcatalog-*.jar:\
    $HCAT_HOME/share/hcatalog/hcatalog-pig-adapter*.jar:\
    $HIVE_HOME/lib/hive-metastore-*.jar:$HIVE_HOME/lib/libthrift-*.jar:\
    $HIVE_HOME/lib/hive-exec-*.jar:$HIVE_HOME/lib/libfb303-*.jar:\
    $HIVE_HOME/lib/jdo2-api-*-ec.jar:$HIVE_HOME/conf:$HADOOP_HOME/etc/hadoop:\
    $HIVE_HOME/lib/slf4j-api-*.jar
    export PIG_OPTS=-Dhive.metastore.uris=thrift://stargate:9083

    que dans /usr/local/pig/conf il faut activer dans pig.propertie le fichier .pigbootup, bizarrement il n'a pas l'air
    de reconnaitre les variables d'environnement, je verrais plus tard.

    pig.load.default.statements=/usr/local/pig/.pigbootup

    hduser@stargate:/usr/local/pig$ cat .pigbootup
    REGISTER /usr/local/hive/hcatalog/share/hcatalog/hive-hcatalog-core-0.14.0.jar;
    REGISTER /usr/local/hive//lib/hive-exec-0.14.0.jar;
    REGISTER /usr/local/hive/lib/hive-metastore-0.14.0.jar;

    Très important, il faut démarrer le hive metastore avant le hiveserver2 pour éviter les problèmes

    d'abord le metastore

    nohup hive --service metastore &

    on vérifier que le monsieur metastore est bien en écoute sur le port
    netstat -an | grep 9083

    puis
    nohup /usr/local/hive/bin/hiveserver2 >$HIVE_LOG_DIR/hiveServer2.out 2>$HIVE_LOG_DIR/hiveServer2.log &


    Cerise sur le gâteau, il faut faire attention au nom de package qu'on appelle dans le LOAD pour accéder au metastore, sinon on plante en
    ERROR 1070 problem import.blabla, il connait pas.

    ventes = LOAD 'jbedb.vente' USING org.apache.hive.hcatalog.pig.HCatLoader();


    Première étape hive, je charge un fichier de vente csv bidon dans une table qui sera utilisée par pig via le metastore
    il en fera une recap avec un total par client et le stockera dans une table de résultat.

    1001,Platini,Menage,2000
    1002,Zidane,Menage,500
    1001,Platini,Menage,600
    1002,Zidane,Ordinateur,1000
    1001,Platini,Ordinateur,500
    1002,Zidane,Menage,1000
    1002,Zidane,Ordinateur,600
    1001,Platini,Menage,700
    1002,Zidane,Ordinateur,800



    beeline> !connect jdbc:hive2://stargate:10000/jbedb hduser servus
    scan complete in 4ms
    Connecting to jdbc:hive2://stargate:10000/jbedb
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/apache-hive-0.14.0-bin/lib/hive-jdbc-0.14.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    Connected to: Apache Hive (version 0.14.0)
    Driver: Hive JDBC (version 0.14.0)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    0: jdbc:hive2://stargate:10000/jbedb> truncate table ventes;
    No rows affected (0,656 seconds)
    0: jdbc:hive2://stargate:10000/jbedb> drop table ventes;
    No rows affected (0,889 seconds)
    0: jdbc:hive2://stargate:10000/jbedb> CREATE TABLE ventes ( custId int, custName String, productType String, value Int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
    No rows affected (0,512 seconds)
    0: jdbc:hive2://stargate:10000/jbedb>LOAD DATA LOCAL inpath '/usr/local/pig/SalesData.csv' overwrite into table ventes;
    No rows affected (0,793 seconds)
    INFO : Loading data to table jbedb.ventes from file:/usr/local/pig/SalesData.csv
    INFO : Table jbedb.ventes stats: [numFiles=1, numRows=0, totalSize=230, rawDataSize=0]
    0: jdbc:hive2://stargate:10000/jbedb> select * from ventes;
    +----------------+------------------+---------------------+---------------+--+
    | ventes.custid | ventes.custname | ventes.producttype | ventes.value |
    +----------------+------------------+---------------------+---------------+--+
    | 1001 | Platini | Menage | 2000 |
    | 1002 | Zidane | Menage | 500 |
    | 1001 | Platini | Menage | 600 |
    | 1002 | Zidane | Ordinateur | 1000 |
    | 1001 | Platini | Ordinateur | 500 |
    | 1002 | Zidane | Menage | 1000 |
    | 1002 | Zidane | Ordinateur | 600 |
    | 1001 | Platini | Menage | 700 |
    | 1002 | Zidane | Ordinateur | 800 |
    +----------------+------------------+---------------------+---------------+--+
    9 rows selected (0,469 seconds)

    Maintenant je lance ka seconde étape avec pig en mode distribuée et sur metastore de hive


    hduser@stargate:/usr/local/pig/bin$ pig -x mapreduce -useHCatalog
    ls: impossible d'accéder Ã* /usr/local/hive/lib/slf4j-api-*.jar: Aucun fichier ou dossier de ce type
    ls: impossible d'accéder Ã* /usr/local/hive/hcatalog/lib/*hbase-storage-handler-*.jar: Aucun fichier ou dossier de ce type
    15/06/07 16:59:20 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
    15/06/07 16:59:20 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
    15/06/07 16:59:20 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
    2015-06-07 16:59:21,031 [main] INFO org.apache.pig.Main - Apache Pig version 0.14.0 (r1640057) compiled Nov 16 2014, 18:02:05
    2015-06-07 16:59:21,032 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/pig-0.14.0/bin/pig_1433689161031.log
    2015-06-07 16:59:21,527 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 16:59:21,530 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 16:59:21,530 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://stargate:9000
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/hbase-0.98.4-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    2015-06-07 16:59:21,718 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where appl icable
    2015-06-07 16:59:22,179 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 16:59:22,179 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: stargate:8050
    2015-06-07 16:59:22,179 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 16:59:22,240 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 16:59:22,241 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 16:59:22,275 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 16:59:22,277 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 16:59:22,307 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 16:59:22,308 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 16:59:22,340 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 16:59:22,342 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 16:59:22,371 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 16:59:22,373 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 16:59:22,402 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 16:59:22,404 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 16:59:22,431 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 16:59:22,432 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 16:59:22,458 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 16:59:22,460 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    grunt> REGISTER /usr/local/hive/hcatalog/share/hcatalog/hive-hcatalog-core-0.14.0.jar;
    2015-06-07 16:59:22,655 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 16:59:22,655 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    grunt> REGISTER /usr/local/hive//lib/hive-exec-0.14.0.jar;
    2015-06-07 16:59:22,678 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 16:59:22,679 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    grunt> REGISTER /usr/local/hive/lib/hive-metastore-0.14.0.jar;
    2015-06-07 16:59:22,701 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 16:59:22,701 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    grunt> ventes = LOAD 'jbedb.vente' USING org.apache.hive.hcatalog.pig.HCatLoader();
    2015-06-07 17:00:40,503 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:00:40,504 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:00:40,873 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* inste ad
    2015-06-07 17:00:40,875 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 17:00:40,875 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 17:00:40,875 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 17:00:40,941 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://stargate:9083
    2015-06-07 17:00:40,994 [main] INFO hive.metastore - Connected to metastore.

    grunt> ventes = LOAD 'jbedb.ventes' USING org.apache.hive.hcatalog.pig.HCatLoader();
    2015-06-07 17:02:49,988 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:02:49,988 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:02:50,012 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:02:50,012 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:02:50,061 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* inste ad
    2015-06-07 17:02:50,061 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 17:02:50,062 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 17:02:50,062 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 17:02:50,063 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://stargate:9083
    2015-06-07 17:02:50,063 [main] INFO hive.metastore - Connected to metastore.
    2015-06-07 17:02:50,227 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* inste ad
    2015-06-07 17:02:50,228 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 17:02:50,228 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 17:02:50,228 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 17:02:50,308 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:02:50,308 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:02:50,341 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* inste ad
    2015-06-07 17:02:50,341 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 17:02:50,342 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 17:02:50,342 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    grunt> dump
    2015-06-07 17:03:10,146 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:03:10,146 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:03:10,179 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* inste ad
    2015-06-07 17:03:10,180 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 17:03:10,180 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 17:03:10,180 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 17:03:10,199 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:03:10,199 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:03:10,204 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
    2015-06-07 17:03:10,228 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:03:10,228 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:03:10,232 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
    2015-06-07 17:03:10,262 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupB yConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFil ter, SplitFilter, StreamTypeCastInserter]}
    2015-06-07 17:03:10,371 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2015-06-07 17:03:10,394 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2015-06-07 17:03:10,394 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2015-06-07 17:03:10,416 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:03:10,440 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:03:10,596 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
    2015-06-07 17:03:10,600 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduc e.markreset.buffer.percent
    2015-06-07 17:03:10,600 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2015-06-07 17:03:10,600 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.c ompress
    2015-06-07 17:03:10,631 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* inste ad
    2015-06-07 17:03:10,632 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 17:03:10,632 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 17:03:10,632 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 17:03:10,760 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:03:10,761 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:03:10,761 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
    2015-06-07 17:03:10,761 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
    2015-06-07 17:03:11,107 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-metastore-0.14. 0.jar to DistributedCache through /tmp/temp-623380625/tmp1308536430/hive-metastore-0.14.0.jar
    2015-06-07 17:03:11,137 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/libthrift-0.9.0.jar to DistributedCache through /tmp/temp-623380625/tmp409944007/libthrift-0.9.0.jar
    2015-06-07 17:03:11,304 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-exec-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp-1125580366/hive-exec-0.14.0.jar
    2015-06-07 17:03:11,404 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/libfb303-0.9.0.jar t o DistributedCache through /tmp/temp-623380625/tmp655893709/libfb303-0.9.0.jar
    2015-06-07 17:03:11,503 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/jdo-api-3.0.1.jar to DistributedCache through /tmp/temp-623380625/tmp-125161463/jdo-api-3.0.1.jar
    2015-06-07 17:03:11,595 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-hbase-handler-0 .14.0.jar to DistributedCache through /tmp/temp-623380625/tmp-1525857397/hive-hbase-handler-0.14.0.jar
    2015-06-07 17:03:11,628 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/hcatalog/share/hcatalog/ hive-hcatalog-core-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp1869088516/hive-hcatalog-core-0.14.0.jar
    2015-06-07 17:03:11,661 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/hcatalog/share/hcatalog/ hive-hcatalog-pig-adapter-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp570675543/hive-hcatalog-pig-adapter-0.14.0.jar
    2015-06-07 17:03:11,703 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/pig-0.14.0-core-h2 .jar to DistributedCache through /tmp/temp-623380625/tmp-1688609819/pig-0.14.0-core-h2.jar
    2015-06-07 17:03:11,736 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/automaton-1.11 -8.jar to DistributedCache through /tmp/temp-623380625/tmp596228608/automaton-1.11-8.jar
    2015-06-07 17:03:11,761 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/antlr-runtime- 3.4.jar to DistributedCache through /tmp/temp-623380625/tmp-1959640922/antlr-runtime-3.4.jar
    2015-06-07 17:03:11,795 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/joda-time-2.1. jar to DistributedCache through /tmp/temp-623380625/tmp73397238/joda-time-2.1.jar
    2015-06-07 17:03:11,835 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2015-06-07 17:03:11,914 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2015-06-07 17:03:11,915 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http. address
    2015-06-07 17:03:11,915 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:03:11,918 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:03:11,934 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:03:12,003 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
    2015-06-07 17:03:12,090 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inp utdir
    2015-06-07 17:03:12,109 [JobControl] INFO org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : 1
    2015-06-07 17:03:12,117 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2015-06-07 17:03:12,244 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
    2015-06-07 17:03:12,344 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1433663908007_0005
    2015-06-07 17:03:12,459 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
    2015-06-07 17:03:12,567 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1433663908007_0005
    2015-06-07 17:03:12,634 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://stargate:8088/proxy/applicati...63908007_0005/
    2015-06-07 17:03:12,634 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1433663908007_0005
    2015-06-07 17:03:12,634 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases ventes
    2015-06-07 17:03:12,635 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: ventes[1,9] C: R:
    2015-06-07 17:03:12,644 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2015-06-07 17:03:12,644 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433663908007_0005]
    2015-06-07 17:03:34,714 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
    2015-06-07 17:03:34,714 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433663908007_0005]
    2015-06-07 17:03:37,725 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:03:37,733 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job his tory server
    2015-06-07 17:03:37,920 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:03:37,926 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job his tory server
    2015-06-07 17:03:37,965 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    2015-06-07 17:03:37,967 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:03:37,971 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job his tory server
    2015-06-07 17:03:38,020 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2015-06-07 17:03:38,022 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    2.6.0 0.14.0 hduser 2015-06-07 17:03:10 2015-06-07 17:03:38 UNKNOWN

    Success!

    Job Stats (time in seconds):
    JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Featur e Outputs
    job_1433663908007_0005 1 0 14 14 14 14 0 0 0 0 ventes MAP_ONLY hdfs://stargate:9000/tmp/temp-623380625/tmp-30 1340857,

    Input(s):
    Successfully read 9 records (12422 bytes) from: "jbedb.ventes"

    Output(s):
    Successfully stored 9 records (272 bytes) in: "hdfs://stargate:9000/tmp/temp-623380625/tmp-301340857"

    Counters:
    Total records written : 9
    Total bytes written : 272
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0

    Job DAG:
    job_1433663908007_0005


    2015-06-07 17:03:38,024 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:03:38,028 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job his tory server
    2015-06-07 17:03:38,060 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:03:38,063 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job his tory server
    2015-06-07 17:03:38,087 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:03:38,090 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job his tory server
    2015-06-07 17:03:38,122 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
    2015-06-07 17:03:38,125 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:03:38,125 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:03:38,125 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
    2015-06-07 17:03:38,134 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2015-06-07 17:03:38,134 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    (1001,Platini,Menage,2000)
    (1002,Zidane,Menage,500)
    (1001,Platini,Menage,600)
    (1002,Zidane,Ordinateur,1000)
    (1001,Platini,Ordinateur,500)
    (1002,Zidane,Menage,1000)
    (1002,Zidane,Ordinateur,600)
    (1001,Platini,Menage,700)
    (1002,Zidane,Ordinateur,800)


    Tout ca pour ca, afin de traiter un fichier dans pig à partir de hive,

  7. #7
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    on a vu le LOAD précedement qui avait besoin d'accèdes au metastore de hive, j'aurais pu prendre le fichier csv directement en utilsant
    USING PigStorage(',') au lieu de l'api Metatore, mais c'est plus marrant par la db.

    exemple: on charge le fichier direcectement, mais on doit décrire le séparateur et la liste des champs et de leur type
    ventes= LOAD 'SalesData.csv' using PigStorage (',') as (custId:int, custName:chararray, producttype:chararray, value:int );

    Maintenant on peut application un filtre avec une condition, cela pour effet de réduire la liste.

    on applique le filtre

    grunt> ordiVendus = FILTER ventes BY producttype == 'Ordinateur';

    2015-06-07 17:42:54,204 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:42:54,204 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:42:54,235 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 17:42:54,235 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 17:42:54,235 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 17:42:54,236 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 17:42:54,267 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:42:54,267 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:42:54,292 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 17:42:54,292 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 17:42:54,292 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 17:42:54,292 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist

    On traite le contenu pour l'affiche la liste sera reduite
    grunt> dump ordiVendus;
    2015-06-07 17:43:02,464 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:43:02,464 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:43:02,491 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 17:43:02,491 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 17:43:02,491 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 17:43:02,491 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 17:43:02,512 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:43:02,512 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:43:02,514 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: FILTER
    2015-06-07 17:43:02,534 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:43:02,534 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:43:02,534 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
    2015-06-07 17:43:02,535 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
    2015-06-07 17:43:02,541 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2015-06-07 17:43:02,542 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2015-06-07 17:43:02,542 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2015-06-07 17:43:02,551 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:43:02,552 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:43:02,554 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
    2015-06-07 17:43:02,555 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2015-06-07 17:43:02,577 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 17:43:02,577 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 17:43:02,577 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 17:43:02,577 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 17:43:02,614 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:43:02,614 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:43:02,614 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
    2015-06-07 17:43:02,692 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-metastore-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp1468266506/hive-metastore-0.14.0.jar
    2015-06-07 17:43:02,725 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/libthrift-0.9.0.jar to DistributedCache through /tmp/temp-623380625/tmp-654533098/libthrift-0.9.0.jar
    2015-06-07 17:43:02,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-exec-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp-1389471049/hive-exec-0.14.0.jar
    2015-06-07 17:43:02,842 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/libfb303-0.9.0.jar to DistributedCache through /tmp/temp-623380625/tmp1345463028/libfb303-0.9.0.jar
    2015-06-07 17:43:02,875 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/jdo-api-3.0.1.jar to DistributedCache through /tmp/temp-623380625/tmp-93693194/jdo-api-3.0.1.jar
    2015-06-07 17:43:02,908 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-hbase-handler-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp796598022/hive-hbase-handler-0.14.0.jar
    2015-06-07 17:43:02,942 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/hcatalog/share/hcatalog/hive-hcatalog-core-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp-2114612693/hive-hcatalog-core-0.14.0.jar
    2015-06-07 17:43:02,967 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp-1599892601/hive-hcatalog-pig-adapter-0.14.0.jar
    2015-06-07 17:43:03,017 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/pig-0.14.0-core-h2.jar to DistributedCache through /tmp/temp-623380625/tmp1524629778/pig-0.14.0-core-h2.jar
    2015-06-07 17:43:03,058 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-623380625/tmp63877365/automaton-1.11-8.jar
    2015-06-07 17:43:03,092 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-623380625/tmp800259486/antlr-runtime-3.4.jar
    2015-06-07 17:43:03,125 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp-623380625/tmp-161986306/joda-time-2.1.jar
    2015-06-07 17:43:03,144 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2015-06-07 17:43:03,181 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2015-06-07 17:43:03,183 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:43:03,216 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
    2015-06-07 17:43:03,291 [JobControl] INFO org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : 1
    2015-06-07 17:43:03,292 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2015-06-07 17:43:03,399 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
    2015-06-07 17:43:03,449 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1433663908007_0006
    2015-06-07 17:43:03,455 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
    2015-06-07 17:43:03,519 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1433663908007_0006
    2015-06-07 17:43:03,523 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://stargate:8088/proxy/applicati...63908007_0006/
    2015-06-07 17:43:03,682 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1433663908007_0006
    2015-06-07 17:43:03,682 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases ordiVendus,ventes
    2015-06-07 17:43:03,682 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: ventes[1,9],ordiVendus[2,13] C: R:
    2015-06-07 17:43:03,688 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2015-06-07 17:43:03,688 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433663908007_0006]
    2015-06-07 17:43:25,718 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
    2015-06-07 17:43:25,718 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433663908007_0006]
    2015-06-07 17:43:28,726 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:43:28,733 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 17:43:28,861 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:43:28,864 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 17:43:28,890 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    2015-06-07 17:43:28,890 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:43:28,894 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 17:43:28,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2015-06-07 17:43:28,929 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    2.6.0 0.14.0 hduser 2015-06-07 17:43:02 2015-06-07 17:43:28 FILTER

    Success!

    Job Stats (time in seconds):
    JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
    job_1433663908007_0006 1 0 14 14 14 14 0 0 0 0 ordiVendus,ventes MAP_ONLY hdfs://stargate:9000/tmp/temp-623380625/tmp1336548572,

    Input(s):
    Successfully read 9 records (12422 bytes) from: "jbedb.ventes"

    Output(s):
    Successfully stored 4 records (129 bytes) in: "hdfs://stargate:9000/tmp/temp-623380625/tmp1336548572"

    Counters:
    Total records written : 4
    Total bytes written : 129
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0

    Job DAG:
    job_1433663908007_0006


    2015-06-07 17:43:28,930 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:43:28,935 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 17:43:28,964 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:43:28,968 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 17:43:28,991 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 17:43:28,995 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 17:43:29,026 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
    2015-06-07 17:43:29,027 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 17:43:29,027 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 17:43:29,027 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
    2015-06-07 17:43:29,035 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2015-06-07 17:43:29,035 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    (1002,Zidane,Ordinateur,1000)
    (1001,Platini,Ordinateur,500)
    (1002,Zidane,Ordinateur,600)
    (1002,Zidane,Ordinateur,800)


    on a réduit la liste, c'est verbeux, il n'est pas très économe en log.

  8. #8
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    Toujours après le LOAD, le FILTER est toujours actif sur la sélection des données
    on utilise un GROUP BY sur le nom client
    puis on vas dumper le résultat

    grunt> groupeVendus = GROUP ordiVendus BY custname;
    2015-06-07 18:04:29,026 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:04:29,026 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:04:29,057 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 18:04:29,057 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 18:04:29,057 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 18:04:29,057 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 18:04:29,081 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:04:29,081 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:04:29,108 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 18:04:29,108 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 18:04:29,108 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 18:04:29,108 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    grunt> dump
    2015-06-07 18:04:41,725 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:04:41,725 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:04:41,752 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 18:04:41,752 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 18:04:41,752 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 18:04:41,752 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 18:04:41,774 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:04:41,774 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:04:41,776 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
    2015-06-07 18:04:41,798 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:04:41,798 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:04:41,798 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
    2015-06-07 18:04:41,798 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
    2015-06-07 18:04:41,804 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2015-06-07 18:04:41,808 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2015-06-07 18:04:41,808 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2015-06-07 18:04:41,816 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:04:41,817 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:04:41,819 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
    2015-06-07 18:04:41,820 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2015-06-07 18:04:41,842 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 18:04:41,843 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 18:04:41,843 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 18:04:41,843 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 18:04:41,873 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:04:41,874 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:04:41,874 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
    2015-06-07 18:04:41,874 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
    2015-06-07 18:04:41,900 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
    2015-06-07 18:04:41,900 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
    2015-06-07 18:04:41,900 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    2015-06-07 18:04:41,900 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
    2015-06-07 18:04:41,975 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-metastore-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp-759461806/hive-metastore-0.14.0.jar
    2015-06-07 18:04:42,000 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/libthrift-0.9.0.jar to DistributedCache through /tmp/temp-623380625/tmp-2001077575/libthrift-0.9.0.jar
    2015-06-07 18:04:42,091 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-exec-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp1993579607/hive-exec-0.14.0.jar
    2015-06-07 18:04:42,116 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/libfb303-0.9.0.jar to DistributedCache through /tmp/temp-623380625/tmp1525816211/libfb303-0.9.0.jar
    2015-06-07 18:04:42,141 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/jdo-api-3.0.1.jar to DistributedCache through /tmp/temp-623380625/tmp1838452442/jdo-api-3.0.1.jar
    2015-06-07 18:04:42,175 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-hbase-handler-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp-693606662/hive-hbase-handler-0.14.0.jar
    2015-06-07 18:04:42,208 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/hcatalog/share/hcatalog/hive-hcatalog-core-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp-1953607127/hive-hcatalog-core-0.14.0.jar
    2015-06-07 18:04:42,233 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp-1306112620/hive-hcatalog-pig-adapter-0.14.0.jar
    2015-06-07 18:04:42,275 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/pig-0.14.0-core-h2.jar to DistributedCache through /tmp/temp-623380625/tmp-2120105410/pig-0.14.0-core-h2.jar
    2015-06-07 18:04:42,308 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-623380625/tmp-248233088/automaton-1.11-8.jar
    2015-06-07 18:04:42,341 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-623380625/tmp942003173/antlr-runtime-3.4.jar
    2015-06-07 18:04:42,375 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp-623380625/tmp1959606950/joda-time-2.1.jar
    2015-06-07 18:04:42,385 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2015-06-07 18:04:42,436 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2015-06-07 18:04:42,437 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:04:42,439 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:04:42,447 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:04:42,474 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
    2015-06-07 18:04:42,548 [JobControl] INFO org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : 1
    2015-06-07 18:04:42,548 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2015-06-07 18:04:42,716 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
    2015-06-07 18:04:42,774 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1433663908007_0007
    2015-06-07 18:04:42,778 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
    2015-06-07 18:04:42,827 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1433663908007_0007
    2015-06-07 18:04:42,830 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://stargate:8088/proxy/applicati...63908007_0007/
    2015-06-07 18:04:42,937 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1433663908007_0007
    2015-06-07 18:04:42,937 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases groupeVendus,ordiVendus,ventes
    2015-06-07 18:04:42,938 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: ventes[1,9],ordiVendus[2,13],groupeVendus[3,15] C: R:
    2015-06-07 18:04:42,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2015-06-07 18:04:42,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433663908007_0007]
    2015-06-07 18:05:17,174 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
    2015-06-07 18:05:17,174 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433663908007_0007]
    2015-06-07 18:05:32,710 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433663908007_0007]
    2015-06-07 18:05:33,215 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:05:33,223 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 18:05:33,353 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:05:33,357 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 18:05:33,392 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:05:33,396 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 18:05:33,431 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2015-06-07 18:05:33,432 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    2.6.0 0.14.0 hduser 2015-06-07 18:04:41 2015-06-07 18:05:33 GROUP_BY,FILTER

    Success!

    Job Stats (time in seconds):
    JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
    job_1433663908007_0007 1 1 15 15 15 15 11 11 11 11 groupeVendus,ordiVendus,ventes GROUP_BY hdfs://stargate:9000/tmp/temp-623380625/tmp-669030939,

    Input(s):
    Successfully read 9 records (12422 bytes) from: "jbedb.ventes"

    Output(s):
    Successfully stored 2 records (148 bytes) in: "hdfs://stargate:9000/tmp/temp-623380625/tmp-669030939"

    Counters:
    Total records written : 2
    Total bytes written : 148
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0

    Job DAG:
    job_1433663908007_0007


    2015-06-07 18:05:33,433 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:05:33,437 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 18:05:33,465 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:05:33,469 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 18:05:33,501 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:05:33,504 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 18:05:33,536 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
    2015-06-07 18:05:33,536 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:05:33,536 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:05:33,537 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
    2015-06-07 18:05:33,544 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2015-06-07 18:05:33,544 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    (Zidane,{(1002,Zidane,Ordinateur,800),(1002,Zidane,Ordinateur,600),(1002,Zidane,Ordinateur,1000)})
    (Platini,{(1001,Platini,Ordinateur,500)})

    grunt>

  9. #9
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    Utilisation du FOREACH pour faire le total des groupes de vente par client.

    on applique le FOREACH

    grunt> custTotalVendus = FOREACH groupeVendus GENERATE group as custname, SUM( ordiVendus.(value)) as value;
    2015-06-07 18:20:59,717 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:20:59,718 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:20:59,747 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 18:20:59,748 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 18:20:59,748 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 18:20:59,748 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 18:20:59,806 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:20:59,806 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:20:59,831 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 18:20:59,831 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 18:20:59,831 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 18:20:59,831 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist

    on dump
    grunt> dump custTotalVendus;
    2015-06-07 18:24:28,776 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 18:24:28,776 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 18:24:28,776 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 18:24:28,776 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 18:24:28,798 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:24:28,798 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:24:28,800 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
    2015-06-07 18:24:28,816 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:24:28,816 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:24:28,816 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
    2015-06-07 18:24:28,816 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
    2015-06-07 18:24:28,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2015-06-07 18:24:28,820 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil - Choosing to move algebraic foreach to combiner
    2015-06-07 18:24:28,822 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2015-06-07 18:24:28,823 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2015-06-07 18:24:28,830 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:24:28,831 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:24:28,833 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
    2015-06-07 18:24:28,834 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2015-06-07 18:24:28,854 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 18:24:28,854 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 18:24:28,854 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 18:24:28,854 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 18:24:28,880 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:24:28,880 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:24:28,880 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
    2015-06-07 18:24:28,880 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
    2015-06-07 18:24:28,902 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
    2015-06-07 18:24:28,902 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
    2015-06-07 18:24:28,902 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    2015-06-07 18:24:28,902 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
    2015-06-07 18:24:28,974 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-metastore-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp-37665561/hive-metastore-0.14.0.jar
    2015-06-07 18:24:29,008 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/libthrift-0.9.0.jar to DistributedCache through /tmp/temp-623380625/tmp-137153112/libthrift-0.9.0.jar
    2015-06-07 18:24:29,099 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-exec-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp389340284/hive-exec-0.14.0.jar
    2015-06-07 18:24:29,132 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/libfb303-0.9.0.jar to DistributedCache through /tmp/temp-623380625/tmp-1697339981/libfb303-0.9.0.jar
    2015-06-07 18:24:29,166 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/jdo-api-3.0.1.jar to DistributedCache through /tmp/temp-623380625/tmp1397486110/jdo-api-3.0.1.jar
    2015-06-07 18:24:29,191 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-hbase-handler-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp-17183688/hive-hbase-handler-0.14.0.jar
    2015-06-07 18:24:29,224 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/hcatalog/share/hcatalog/hive-hcatalog-core-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp185634080/hive-hcatalog-core-0.14.0.jar
    2015-06-07 18:24:29,249 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-0.14.0.jar to DistributedCache through /tmp/temp-623380625/tmp1947741449/hive-hcatalog-pig-adapter-0.14.0.jar
    2015-06-07 18:24:29,291 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/pig-0.14.0-core-h2.jar to DistributedCache through /tmp/temp-623380625/tmp-541595554/pig-0.14.0-core-h2.jar
    2015-06-07 18:24:29,324 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-623380625/tmp2124757516/automaton-1.11-8.jar
    2015-06-07 18:24:29,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-623380625/tmp2107909947/antlr-runtime-3.4.jar
    2015-06-07 18:24:29,391 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp-623380625/tmp-833660242/joda-time-2.1.jar
    2015-06-07 18:24:29,399 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2015-06-07 18:24:29,400 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
    2015-06-07 18:24:29,400 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
    2015-06-07 18:24:29,400 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
    2015-06-07 18:24:29,438 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2015-06-07 18:24:29,438 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:24:29,440 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:24:29,446 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:24:29,474 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
    2015-06-07 18:24:29,539 [JobControl] INFO org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : 1
    2015-06-07 18:24:29,539 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2015-06-07 18:24:29,640 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
    2015-06-07 18:24:29,690 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1433663908007_0009
    2015-06-07 18:24:29,695 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
    2015-06-07 18:24:29,750 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1433663908007_0009
    2015-06-07 18:24:29,753 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://stargate:8088/proxy/applicati...63908007_0009/
    2015-06-07 18:24:29,939 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1433663908007_0009
    2015-06-07 18:24:29,939 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases custTotalVendus,groupeVendus,ordiVendus,ventes
    2015-06-07 18:24:29,939 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: ventes[1,9],ordiVendus[2,13],custTotalVendus[4,18],groupeVendus[3,15] C: custTotalVendus[4,18],groupeVendus[3,15] R: custTotalVendus[4,18]
    2015-06-07 18:24:29,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2015-06-07 18:24:29,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433663908007_0009]
    2015-06-07 18:25:02,469 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
    2015-06-07 18:25:02,469 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433663908007_0009]
    2015-06-07 18:25:07,479 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433663908007_0009]
    2015-06-07 18:25:10,489 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:25:10,494 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 18:25:10,618 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:25:10,621 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 18:25:10,671 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:25:10,675 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 18:25:10,707 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2015-06-07 18:25:10,707 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    2.6.0 0.14.0 hduser 2015-06-07 18:24:28 2015-06-07 18:25:10 GROUP_BY,FILTER

    Success!

    Job Stats (time in seconds):
    JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
    job_1433663908007_0009 1 1 13 13 13 13 3 3 3 3 custTotalVendus,groupeVendus,ordiVendus,ventes GROUP_BY,COMBINER hdfs://stargate:9000/tmp/temp-623380625/tmp-830833194,

    Input(s):
    Successfully read 9 records (12422 bytes) from: "jbedb.ventes"

    Output(s):
    Successfully stored 2 records (33 bytes) in: "hdfs://stargate:9000/tmp/temp-623380625/tmp-830833194"

    Counters:
    Total records written : 2
    Total bytes written : 33
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0

    Job DAG:
    job_1433663908007_0009


    2015-06-07 18:25:10,709 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:25:10,713 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 18:25:10,740 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:25:10,743 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 18:25:10,775 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 18:25:10,778 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 18:25:10,813 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
    2015-06-07 18:25:10,814 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 18:25:10,814 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 18:25:10,814 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
    2015-06-07 18:25:10,821 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2015-06-07 18:25:10,821 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    (Zidane,2400)
    (Platini,500)

  10. #10
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    la boucle est bouclée après le LOAD, le FILTER, le GROUP BY, le FO EACH on sauve le résultat dans une table hive qui doit à au format et au type attendus par
    pig , dans ce cas précis, un ensemble de ligne (string, bigint)

    Une surprise, lors du store on ne peut pas préciser les champ, il faut le faire avant au niveau du foreach, exemple group as custname, sum( x) as value;
    sinon le store sera rejeté par hive. ca ce n'est pas clairement dit.

    Dans avant lancement du STORE PIG, il faut creer une table résultat du côté HIVE
    : jdbc:hive2://stargate:10000/jbedb>[ create table ventesclient ( custname String, value bigint) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
    No rows affected (1,041 seconds)
    0: jdbc:hive2://stargate:10000/jbedb> select * from ventesclient;
    +------------------------+---------------------+--+
    | ventesclient.custname | ventesclient.value |
    +------------------------+---------------------+--+

    Maintenant on peut lancer le STORE côté pig
    grunt> STORE custTotalVendus INTO 'jbedb.ventesclient' USING org.apache.hive.hcatalog.pig.HCatStorer();
    2015-06-07 19:28:09,295 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 19:28:09,295 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 19:28:09,328 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 19:28:09,328 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 19:28:09,328 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 19:28:09,328 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 19:28:09,329 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://stargate:9083
    2015-06-07 19:28:09,329 [main] INFO hive.metastore - Connected to metastore.
    2015-06-07 19:28:09,367 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 19:28:09,367 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 19:28:09,385 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 19:28:09,385 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 19:28:09,421 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 19:28:09,422 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 19:28:09,422 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 19:28:09,422 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 19:28:09,446 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 19:28:09,446 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 19:28:09,472 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 19:28:09,472 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 19:28:09,472 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 19:28:09,472 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 19:28:09,512 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
    2015-06-07 19:28:09,528 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 19:28:09,531 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 19:28:09,572 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 19:28:09,572 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 19:28:09,572 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 19:28:09,572 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 19:28:09,589 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 19:28:09,590 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 19:28:09,632 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
    2015-06-07 19:28:09,652 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 19:28:09,652 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 19:28:09,653 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
    2015-06-07 19:28:09,653 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
    2015-06-07 19:28:09,665 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2015-06-07 19:28:09,668 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil - Choosing to move algebraic foreach to combiner
    2015-06-07 19:28:09,680 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2015-06-07 19:28:09,680 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2015-06-07 19:28:09,689 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 19:28:09,690 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 19:28:09,693 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
    2015-06-07 19:28:09,693 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2015-06-07 19:28:09,717 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 19:28:09,718 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 19:28:09,718 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 19:28:09,718 [main] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 19:28:09,751 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 19:28:09,751 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 19:28:09,751 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
    2015-06-07 19:28:09,751 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
    2015-06-07 19:28:09,778 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
    2015-06-07 19:28:09,778 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
    2015-06-07 19:28:09,778 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    2015-06-07 19:28:09,778 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
    2015-06-07 19:28:09,869 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-metastore-0.14.0.jar to DistributedCache through /tmp/temp1318958339/tmp-782314409/hive-metastore-0.14.0.jar
    2015-06-07 19:28:09,902 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/libthrift-0.9.0.jar to DistributedCache through /tmp/temp1318958339/tmp-237449792/libthrift-0.9.0.jar
    2015-06-07 19:28:09,986 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-exec-0.14.0.jar to DistributedCache through /tmp/temp1318958339/tmp-1645679962/hive-exec-0.14.0.jar
    2015-06-07 19:28:10,019 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/libfb303-0.9.0.jar to DistributedCache through /tmp/temp1318958339/tmp754678357/libfb303-0.9.0.jar
    2015-06-07 19:28:10,052 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/jdo-api-3.0.1.jar to DistributedCache through /tmp/temp1318958339/tmp50696003/jdo-api-3.0.1.jar
    2015-06-07 19:28:10,086 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-hbase-handler-0.14.0.jar to DistributedCache through /tmp/temp1318958339/tmp461403586/hive-hbase-handler-0.14.0.jar
    2015-06-07 19:28:10,119 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/hcatalog/share/hcatalog/hive-hcatalog-core-0.14.0.jar to DistributedCache through /tmp/temp1318958339/tmp1556016879/hive-hcatalog-core-0.14.0.jar
    2015-06-07 19:28:10,144 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-0.14.0.jar to DistributedCache through /tmp/temp1318958339/tmp-703920126/hive-hcatalog-pig-adapter-0.14.0.jar
    2015-06-07 19:28:10,186 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/pig-0.14.0-core-h2.jar to DistributedCache through /tmp/temp1318958339/tmp1338067857/pig-0.14.0-core-h2.jar
    2015-06-07 19:28:10,219 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1318958339/tmp-185389387/automaton-1.11-8.jar
    2015-06-07 19:28:10,252 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1318958339/tmp-1567315164/antlr-runtime-3.4.jar
    2015-06-07 19:28:10,285 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp1318958339/tmp-1207475514/joda-time-2.1.jar
    2015-06-07 19:28:10,297 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2015-06-07 19:28:10,298 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
    2015-06-07 19:28:10,298 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
    2015-06-07 19:28:10,299 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
    2015-06-07 19:28:10,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2015-06-07 19:28:10,382 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 19:28:10,384 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 19:28:10,392 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 19:28:10,427 [JobControl] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
    2015-06-07 19:28:10,427 [JobControl] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.attempts does not exist
    2015-06-07 19:28:10,427 [JobControl] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.metastore.ds.retry.interval does not exist
    2015-06-07 19:28:10,427 [JobControl] WARN org.apache.hadoop.hive.conf.HiveConf - HiveConf of name hive.stats.map.parallelism does not exist
    2015-06-07 19:28:10,440 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-07 19:28:10,440 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-07 19:28:10,441 [JobControl] INFO hive.metastore - Trying to connect to metastore with URI thrift://stargate:9083
    2015-06-07 19:28:10,441 [JobControl] INFO hive.metastore - Connected to metastore.
    2015-06-07 19:28:10,454 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    2015-06-07 19:28:10,485 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
    2015-06-07 19:28:10,555 [JobControl] INFO org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : 1
    2015-06-07 19:28:10,555 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2015-06-07 19:28:10,651 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
    2015-06-07 19:28:10,710 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1433663908007_0016
    2015-06-07 19:28:10,715 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
    2015-06-07 19:28:10,783 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1433663908007_0016
    2015-06-07 19:28:10,786 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://stargate:8088/proxy/applicati...63908007_0016/
    2015-06-07 19:28:10,883 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1433663908007_0016
    2015-06-07 19:28:10,883 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases custTotalVendus,groupeVendus,ordiVendus,ventes
    2015-06-07 19:28:10,883 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: ventes[1,9],ordiVendus[2,13],custTotalVendus[5,18],groupeVendus[3,15] C: custTotalVendus[5,18],groupeVendus[3,15] R: custTotalVendus[5,18]
    2015-06-07 19:28:10,889 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2015-06-07 19:28:10,890 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433663908007_0016]
    2015-06-07 19:28:35,693 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
    2015-06-07 19:28:35,693 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433663908007_0016]
    2015-06-07 19:28:53,232 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433663908007_0016]
    2015-06-07 19:28:56,241 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 19:28:56,248 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 19:28:56,372 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 19:28:56,377 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 19:28:56,415 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 19:28:56,419 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 19:28:56,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2015-06-07 19:28:56,447 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    2.6.0 0.14.0 hduser 2015-06-07 19:28:09 2015-06-07 19:28:56 GROUP_BY,FILTER

    Success!

    Job Stats (time in seconds):
    JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
    job_1433663908007_0016 1 1 4 4 4 4 14 14 14 14 custTotalVendus,groupeVendus,ordiVendus,ventes GROUP_BY,COMBINER jbedb.ventesclient,

    Input(s):
    Successfully read 9 records (12422 bytes) from: "jbedb.ventes"

    Output(s):
    Successfully stored 2 records (24 bytes) in: "jbedb.ventesclient"

    Counters:
    Total records written : 2
    Total bytes written : 24
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0

    Job DAG:
    job_1433663908007_0016


    2015-06-07 19:28:56,448 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 19:28:56,452 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 19:28:56,481 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 19:28:56,485 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 19:28:56,512 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-07 19:28:56,517 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-07 19:28:56,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
    grunt>


    Dans Hive ca donne
    une recap des ventes avec un total par client

    0: jdbc:hive2://stargate:10000/jbedb> select * from ventesclient;
    +------------------------+---------------------+--+
    | ventesclient.custname | ventesclient.value |
    +------------------------+---------------------+--+
    | Zidane | 2400 |
    | Platini | 500 |
    +------------------------+---------------------+--+
    2 rows selected (0,581 seconds)
    0: jdbc:hive2://stargate:10000/jbedb>

    Le script pig complet, il est pas bien gros
    ventes = LOAD 'jbedb.ventes' USING org.apache.hive.hcatalog.pig.HCatLoader();
    ordiVendus = FILTER ventes BY producttype == 'Ordinateur';
    dump ordiVendus;
    groupeVendus = GROUP ordiVendus BY custname;
    dump groupeVendus;
    custTotalVendus = FOREACH groupeVendus GENERATE group as custname, SUM( ordiVendus.(value)) as value;
    dump custTotalVendus;
    STORE custTotalVendus INTO 'jbedb.ventesclient' USING org.apache.hive.hcatalog.pig.HCatStorer();
    Conclusion, pig est un langage simplifié et relativement facile à apprendre une fois qu'on a une plateforme correctement installer et configurer, c'est une alternative à la programmation
    plus complexe du map reduce en java.

    il me reste quelques fonctions à voir avant d'attaquer le gros morceau qu'est le java map reduce.

    Affaire à suivre

  11. #11
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    on peut faire la même chose avec du hbase, j'évite de reposter les logs qui sont quasiment les même, néanmoins il y a une subtilité, il faut une colonne id en clé primaire avec
    une valeur unique de type sequence pour éviter l'écrasement des données via le custId, en hive il fait ca en append alors qu'en hbase il est en insert/update, en hbase
    mon exemple marche moins bien puisqu'il a un custId 1001 ou 1002 et on se retrouve qu'avec 2 records au lieu de 10, les 2 derniers écrasent les autres records qui ont la même PK.

    Hive: creation au niveau de hive d'une table hbase (ligne,colonne) avec un mapping hive pour manipuler les données de hbase en sql like.

    CREATE TABLE hbventes( custId int, custName String, productType String, value Int ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,f:custName,f:productType,f:value') TBLPROPERTIES ('hbase.table.name' = 'hbventes');


    pig:

    on utilise HBaseStorage avec le mapping des champs hbase/pig

    ventes = LOAD 'hbase://hbventes' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('f:custName f:productType f:value', '-loadKey true -limit 5') as ( custId:bytearray, custName:chararray, producttype:chararray, value:int );


    la clé se place implicitement dans le premier champ qui est mappé, custId en l'occurrence, les noms des champs sont case sensitif, si les noms sont incorrects, ils seront vide.

    ce qui suit est pareil a ce qui a été fait pour hive, on peut traiter comme une table hive-hbase via hcatalog, mais c'est beaucoup plus lent qu'en direct à ce que j'ai constaté.

    pour le store, même on créer une table de résultat à partir de hive sur hbase.

    Hive
    CREATE TABLE hbventesclient( custName String, value Int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,f:value') TBLPROPERTIES ('hbase.table.name' = 'hbventesclient');
    No rows affected (0,983 seconds)

    pig:
    custTotalVendus = FOREACH groupeVendus GENERATE group as custName, SUM( ordiVendus.(value)) as value;

    STORE custTotalVendus INTO 'hbase://hbventesclient' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'f:value');

    la cle custName est implicitement stockée

    ./hbase shell

    t.scan hbase

    hbase(main):008:0> t=get_table 'hbventes'
    base(main):011:0> t.describe
    DESCRIPTION ENABLED
    'hbventes', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS true
    => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE =
    > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
    1 row(s) in 0.0900 seconds
    hbase(main):012:0> t.scan
    ROW COLUMN+CELL
    1001 column=f:custName, timestamp=1433789800767, value=Platini
    1001 column=f:productType, timestamp=1433789800767, value=Menage
    1001 column=f:value, timestamp=1433789800767, value=700
    1002 column=f:custName, timestamp=1433789800767, value=Zidane
    1002 column=f:productType, timestamp=1433789800767, value=Ordinateur
    1002 column=f:value, timestamp=1433789800767, value=800
    2 row(s) in 0.0840 seconds
    hbase(main):013:0>

    hbase(main):008:0> t=get_table 'hbventesclient'
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/hbase-0.98.4-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    2015-06-08 23:13:31,897 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    0 row(s) in 0.8650 seconds

    => Hbase::Table - hbventesclient
    hbase(main):009:0> t.scan
    ROW COLUMN+CELL
    Zidane column=f:value, timestamp=1433797707423, value=800
    1 row(s) in 0.2330 seconds

    Affaire à suivre

  12. #12
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    les jointures en pig, j'ai fait deux fichiers bidons csv, pas de table, employe et service

    la jointure se fait sur l serviceId,
    /tmp/employe.csv
    1,Bordenave,10,M
    2,Dupond,12,M
    3,Lexa,11,F
    4,Doig,10,F

    /tmp/serivce.csv
    10,sales
    11,purchase
    12,inventory

    les deux fichiers sont copier par la commande hadoop sur hdfs
    hadoop fs -put -f /tmp/employe.csv /tmp
    hadoop fs -put -f /tmp/service.csv /tmp

    pig -x -mapreduce -useHCatalog
    grunt> empData = LOAD '/tmp/employe.csv' USING PigStorage(',') AS ( empId:int, empNom:chararray, serviceId:int, genre:chararray);
    2015-06-10 20:20:49,949 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-10 20:20:49,949 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    grunt> serviceData = LOAD '/tmp/service.csv' USING PigStorage(',') AS ( serviceId:int, serviceNom:chararray);
    2015-06-10 20:20:57,628 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-10 20:20:57,628 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    grunt> joinEmpService = JOIN serviceData by serviceId, empData by serviceId;
    grunt> describe joinEmpService;
    joinEmpService: {serviceData::serviceId: int,serviceData::serviceNom: chararray,empData::empId: int,empData::empNom: chararray,empData::serviceId: int,empData::genre: chararray}
    grunt>
    grunt> dump joinEmpService;
    2015-06-10 20:21:15,190 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: HASH_JOIN
    2015-06-10 20:21:15,210 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-10 20:21:15,210 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-10 20:21:15,210 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
    2015-06-10 20:21:15,210 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
    2015-06-10 20:21:15,215 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2015-06-10 20:21:15,217 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage->POForEach to POPackage(JoinPackager)
    2015-06-10 20:21:15,217 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2015-06-10 20:21:15,217 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2015-06-10 20:21:15,225 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-10 20:21:15,227 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-10 20:21:15,228 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
    2015-06-10 20:21:15,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2015-06-10 20:21:15,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
    2015-06-10 20:21:15,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
    2015-06-10 20:21:15,233 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=91
    2015-06-10 20:21:15,233 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
    2015-06-10 20:21:15,233 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
    2015-06-10 20:21:15,299 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-metastore-0.14.0.jar to DistributedCache through /tmp/temp103144561/tmp-1252284296/hive-metastore-0.14.0.jar
    2015-06-10 20:21:15,332 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/libthrift-0.9.0.jar to DistributedCache through /tmp/temp103144561/tmp-364917371/libthrift-0.9.0.jar
    2015-06-10 20:21:15,424 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-exec-0.14.0.jar to DistributedCache through /tmp/temp103144561/tmp47921849/hive-exec-0.14.0.jar
    2015-06-10 20:21:15,457 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/libfb303-0.9.0.jar to DistributedCache through /tmp/temp103144561/tmp227102308/libfb303-0.9.0.jar
    2015-06-10 20:21:15,491 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/jdo-api-3.0.1.jar to DistributedCache through /tmp/temp103144561/tmp-209625498/jdo-api-3.0.1.jar
    2015-06-10 20:21:15,524 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/lib/hive-hbase-handler-0.14.0.jar to DistributedCache through /tmp/temp103144561/tmp1304119021/hive-hbase-handler-0.14.0.jar
    2015-06-10 20:21:15,557 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/hcatalog/share/hcatalog/hive-hcatalog-core-0.14.0.jar to DistributedCache through /tmp/temp103144561/tmp-828728074/hive-hcatalog-core-0.14.0.jar
    2015-06-10 20:21:15,582 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-0.14.0.jar to DistributedCache through /tmp/temp103144561/tmp1761929676/hive-hcatalog-pig-adapter-0.14.0.jar
    2015-06-10 20:21:15,624 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/pig-0.14.0-core-h2.jar to DistributedCache through /tmp/temp103144561/tmp-1160520263/pig-0.14.0-core-h2.jar
    2015-06-10 20:21:15,657 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp103144561/tmp-1409813105/automaton-1.11-8.jar
    2015-06-10 20:21:15,682 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp103144561/tmp-747331071/antlr-runtime-3.4.jar
    2015-06-10 20:21:15,715 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.14.0/lib/joda-time-2.1.jar to DistributedCache through /tmp/temp103144561/tmp-1024208495/joda-time-2.1.jar
    2015-06-10 20:21:15,724 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2015-06-10 20:21:15,725 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
    2015-06-10 20:21:15,725 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
    2015-06-10 20:21:15,725 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
    2015-06-10 20:21:15,743 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2015-06-10 20:21:15,745 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-10 20:21:15,782 [JobControl] WARN org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
    2015-06-10 20:21:15,845 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2015-06-10 20:21:15,845 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2015-06-10 20:21:15,847 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2015-06-10 20:21:15,854 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2015-06-10 20:21:15,854 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2015-06-10 20:21:15,856 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2015-06-10 20:21:15,957 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:2
    2015-06-10 20:21:16,007 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1433956492399_0009
    2015-06-10 20:21:16,011 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
    2015-06-10 20:21:16,066 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1433956492399_0009
    2015-06-10 20:21:16,070 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://stargate:8088/proxy/applicati...56492399_0009/
    2015-06-10 20:21:16,244 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1433956492399_0009
    2015-06-10 20:21:16,244 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases empData,joinEmpService,serviceData
    2015-06-10 20:21:16,244 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: serviceData[8,14],serviceData[-1,-1],joinEmpService[9,17],empData[7,10],empData[-1,-1],joinEmpService[9,17] C: R:
    2015-06-10 20:21:16,249 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2015-06-10 20:21:16,249 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433956492399_0009]
    2015-06-10 20:21:28,265 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
    2015-06-10 20:21:28,265 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433956492399_0009]
    2015-06-10 20:21:43,285 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1433956492399_0009]
    2015-06-10 20:21:46,293 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-10 20:21:46,300 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-10 20:21:46,458 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-10 20:21:46,463 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-10 20:21:46,505 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-10 20:21:46,510 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-10 20:21:46,542 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2015-06-10 20:21:46,542 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt Features
    2.6.0 0.14.0 hduser 2015-06-10 20:21:15 2015-06-10 20:21:46 HASH_JOIN

    Success!

    Job Stats (time in seconds):
    JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
    job_1433956492399_0009 2 1 3 3 3 3 12 12 12 12 empData,joinEmpService,serviceData HASH_JOIN hdfs://stargate:9000/tmp/temp103144561/tmp-1484661101,

    Input(s):
    Successfully read 5 records from: "/tmp/employe.csv"
    Successfully read 3 records from: "/tmp/service.csv"

    Output(s):
    Successfully stored 4 records (131 bytes) in: "hdfs://stargate:9000/tmp/temp103144561/tmp-1484661101"

    Counters:
    Total records written : 4
    Total bytes written : 131
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0

    Job DAG:
    job_1433956492399_0009


    2015-06-10 20:21:46,543 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-10 20:21:46,547 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-10 20:21:46,579 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-10 20:21:46,582 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-10 20:21:46,610 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at stargate/192.168.0.11:8032
    2015-06-10 20:21:46,614 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    2015-06-10 20:21:46,649 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 3 time(s).
    2015-06-10 20:21:46,649 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
    2015-06-10 20:21:46,649 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
    2015-06-10 20:21:46,650 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2015-06-10 20:21:46,650 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
    2015-06-10 20:21:46,657 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2015-06-10 20:21:46,658 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    (10,sales ,4,Doig,10,F)
    (10,sales ,1,Bordenave,10,M)
    (11,purchase,3,Lexa,11,F)
    (12,inventory,2,Dupond,12,M)

  13. #13
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    ORDER BY en pig sur une table hive via metastore hcatalog

    ventes = LOAD 'jbedb.ventes' USING org.apache.hive.hcatalog.pig.HCatLoader;
    ordreVentes = ORDER ventes by value DESC;
    dump orderVentes;
    utilisation de la boite à outils datafu

    http://datafu.incubator.apache.org/d...tatistics.html

    creation d'un fichier input
    0
    1
    2
    3
    4
    3
    2
    copy sur le hdfs avec hadoop fs -put -f input /tmp

    calcul du median
    REGISTER /usr/local/datafu/lib/datafu-1.2.0.jar
    DEFINE Median datafu.pig.stats.StreamingMedian();
    data = LOAD '/tmp/input' using PigStorage() as (val:int);
    data = FOREACH (GROUP data ALL) GENERATE Median(data);
    dump data;
    calcul quantile.
    ventes = LOAD 'jbedb.ventes' USING org.apache.hive.hcatalog.pig.HCatLoader;
    ordreVentes = ORDER ventes by value DESC;
    dump ordreVentes;
    2015-06-10 21:45:47,821 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    (1001,Platini,Menage,2000,)
    (1002,Zidane,Menage,1000,)
    (1002,Zidane,Ordinateur,1000,)
    (1002,Zidane,Ordinateur,800,)
    (1001,Platini,Menage,700,)
    (1002,Zidane,Ordinateur,600,)
    (1001,Platini,Menage,600,)
    (1001,Platini,Ordinateur,500,)
    (1002,Zidane,Menage,500,)
    REGISTER /usr/local/datafu/lib/datafu-1.2.0.jar
    DEFINE Quantile datafu.pig.stats.Quantile( '0.0', '0.25', '0.5', '0.75', '1.0' );
    quantData = FOREACH ( GROUP ordreVentes ALL ) GENERATE Quantile(ordreVentes.value);
    describe quantData;
    quantData: {(quantile_0_0: double,quantile_0_25: double,quantile_0_5: double,quantile_0_75: double,quantile_1_0: double)}

    dump quantData;
    extrait de log après calcul
    2015-06-10 21:30:40,541 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    ((500.0,600.0,700.0,1000.0,2000.0))

  14. #14
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    utilisation des UDF (user definition function) pour pig

    http://pig.apache.org/docs/r0.14.0/udf.html


    REGISTER /home/hduser/udf/jbetest/genreLibelle.jar
    empData = LOAD '/tmp/employe.csv' USING PigStorage(',') AS ( empId:int, empNom:chararray, serviceId:int, genre:chararray);
    dump empData
    (1,Bordenave,10,M)
    (2,Dupond,12,M)
    (3,Lexa,11,F)
    (4,Doig,10,F)
    (,,,)
    on va applquer un UDF pour afficher Homme/Femme pour F/M

    genre = FOREACH empData generate empId, jbetest.genreLibelle(genre);

    dump genre
    2015-06-10 23:09:44,158 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    (1,Homme)
    (2,Homme)
    (3,Femme)
    (4,Femme)
    (,)
    grunt>
    le programme java, subtilité ! j'ai une ligne vide dans mon fichier, je suis obligé de tester le null.

    package jbetest;

    import java.io.IOException;
    import org.apache.pig.EvalFunc;
    import org.apache.pig.data.Tuple;

    public class genreLibelle extends EvalFunc<String>
    {
    public String exec(Tuple input) throws IOException {

    if ( input == null || input.size() == 0) return null;

    try {

    String str=(String) input.get(0);
    if ( str == null ) return null;
    if ( str.equals("F") ) {
    return "Femme";
    }
    else {
    return "Homme";
    }

    }
    catch ( Exception e ) {
    throw new IOException("caught exception processing input row",e);
    }
    }
    }
    ./compilepig.sh jbetest/genreLibelle.java

    script compilepig.sh

    #!/bin/bash
    set -x
    if [ "$1" == "" ]; then
    echo "Usage: $0 <java file>"
    exit 1
    fi

    CNAME=${1%.java}
    JARNAME=$CNAME.jar
    JARDIR=/tmp/pig_jars/$CNAME
    CLASSPATH=$(ls $PIG_HOME/pig*h1.jar):$(ls $PIG_HOME/pig*h2.jar):$(ls $HADOOP_HOME/share/hadoop/common/hadoop-common-?.?.?.jar)

    mkdir -p $JARDIR
    javac -classpath $CLASSPATH -d $JARDIR/ $1 && jar -cf $JARNAME -C $JARDIR/ .
    ~
    Affaire à suivre

  15. #15
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    premier map reduce java, je ne voulais pas faire de wordcount comme en trouve partout et à toutes les sauces.

    j'ai fait un fichier csv de test, un quiz par candidat avec un score par question, je vais regrouper ces résultats dans le fichier de sortie
    j'utiliser le séparateur :

    cat quizcandidat.csv
    1:S DeSuza:Q4:0
    2:JP Durand:Q2:2
    3 dupond:Q3:0
    2:JP Durand:Q4:1
    2:JP Durand:Q5:5
    3:C. Martin:Q1:0
    2:JP Durand:Q2:4
    2:JP Durand:Q3:1
    1:S DeSuza:Q1:1
    2:JP Durand:Q4:0
    2:JP Durand:Q5:1
    3 dupond:Q1:2
    3 dupond:Q2:3
    2:JP Durand:Q3:0
    3 dupond:Q4:1
    3 dupond:Q5:3
    1:S DeSuza:Q2:2
    2:JP Durand:Q2:4
    1:S DeSuza:Q3:3
    1:S DeSuza:Q5:1
    2:JP Durand:Q1:3
    LE code, pour comprendre, en regardant le mapper, il faut raisonner, dans l'ordre des paramètres, KEYIN, VALUEIN, KEYOUT, VALUEOUT avec leur type, le reducer est plus compliqué.
    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    package jbetest;
    
    import java.io.IOException;
    import java.util.StringTokenizer;
    
    import jbetest.QuizScore.QuizMapper;
    import jbetest.QuizScore.QuizReducer;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    public class QuizScore {
    
    	public static class QuizMapper extends
    			Mapper<LongWritable, Text, Text, LongWritable> {
    
    		private Text word = new Text();
    
    		private LongWritable scoreQ = new LongWritable();
    
    		public void map(LongWritable key, Text value, Context context)
    				throws IOException, InterruptedException {
    
    			if (value.getLength() > 0) {
    				String[] line = value.toString().split(":");
    				try {
    					word.set(line[1]);
    					scoreQ.set(Long.parseLong(line[3]));
    
    					context.write(word, scoreQ);
    				} catch (NumberFormatException e) {
    					// cannot parse - ignore
    				}
    			}
    		}
    	}
    
    	public static class QuizReducer<KEY> extends
    			Reducer<KEY, LongWritable, KEY, LongWritable> {
    
    		private LongWritable result = new LongWritable();
    
    		public void reduce(KEY key, Iterable<LongWritable> values,
    				Context context) throws IOException, InterruptedException {
    			long sum = 0;
    			for (LongWritable val : values) {
    				sum += val.get();
    			}
    			result.set(sum);
    			context.write(key, result);
    		}
    
    	}
    
    	public static void main(String[] args) throws Exception {
    
    		if (args.length != 2) {
    			System.err.println("Usage: QuizScore  <input path> <output path>");
    			System.exit(-1);
    		}
    
    		for (String arg : args) {
    			System.out.println("arg=" + arg);
    		}
    
    		Configuration conf = new Configuration();
    		Job job = Job.getInstance(conf, "quizscoring");
    
    		job.setJarByClass(QuizScore.class);
    
    		job.setMapperClass(QuizMapper.class);
    		job.setCombinerClass(QuizReducer.class);
    		job.setReducerClass(QuizReducer.class);
    
    		job.setOutputKeyClass(Text.class);
    		job.setOutputValueClass(LongWritable.class);
    
    		FileInputFormat.addInputPath(job, new Path(args[0]));
    
    		FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
    		boolean result = job.waitForCompletion(true);
    
    		System.exit(result ? 0 : 1);
    
    	}
    
    }
    compilemr.sh
    #!/bin/bash
    set -x
    if [ "$1" == "" ]; then
    echo "Usage: $0 <java file>"
    exit 1
    fi

    CNAME=${1%.java}
    JARNAME=$CNAME.jar
    JARDIR=/tmp/mr_jars/$CNAME
    CLASSPATH=$(ls $HADOOP_HOME/share/hadoop/common/hadoop-common-?.?.?.jar):$(ls $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-?.?.?.jar):$(ls $HADOOP_HOME/share/hadoop/common/lib/hadoop-annotations-?.?.?.jar)
    echo $CLASSPATH
    mkdir -p $JARDIR
    javac -classpath $CLASSPATH -d $JARDIR/ $1 && jar -cf $JARNAME -C $JARDIR/ .
    execution, le ficiher quizcandidat.csv aura été préalablement copier sur l'hdfs, le répertoire output sera créer, il y aura rejet si il existe déjà sur l'hdfs
    hadoop jar jbetest/QuizScore.jar jbetest.QuizScore /tmp/mapreduce/input/quizcandidat.csv /tmp/mapreduce/output
    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    arg=/tmp/mapreduce/input/quizcandidat.csv
    arg=/tmp/mapreduce/output
    15/06/15 23:48:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    15/06/15 23:48:12 INFO client.RMProxy: Connecting to ResourceManager at stargate/192.168.0.11:8032
    15/06/15 23:48:12 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    15/06/15 23:48:13 INFO input.FileInputFormat: Total input paths to process : 1
    15/06/15 23:48:13 INFO mapreduce.JobSubmitter: number of splits:1
    15/06/15 23:48:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1434388221382_0018
    15/06/15 23:48:13 INFO impl.YarnClientImpl: Submitted application application_1434388221382_0018
    15/06/15 23:48:13 INFO mapreduce.Job: The url to track the job: http://stargate:8088/proxy/applicati...88221382_0018/
    15/06/15 23:48:13 INFO mapreduce.Job: Running job: job_1434388221382_0018
    15/06/15 23:48:29 INFO mapreduce.Job: Job job_1434388221382_0018 running in uber mode : false
    15/06/15 23:48:29 INFO mapreduce.Job:  map 0% reduce 0%
    15/06/15 23:48:34 INFO mapreduce.Job:  map 100% reduce 0%
    15/06/15 23:48:40 INFO mapreduce.Job:  map 100% reduce 100%
    15/06/15 23:48:40 INFO mapreduce.Job: Job job_1434388221382_0018 completed successfully
    15/06/15 23:48:40 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=86
                    FILE: Number of bytes written=214259
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=484
                    HDFS: Number of bytes written=49
                    HDFS: Number of read operations=6
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters
                    Launched map tasks=1
                    Launched reduce tasks=1
                    Data-local map tasks=1
                    Total time spent by all maps in occupied slots (ms)=3168
                    Total time spent by all reduces in occupied slots (ms)=5862
                    Total time spent by all map tasks (ms)=3168
                    Total time spent by all reduce tasks (ms)=2931
                    Total vcore-seconds taken by all map tasks=3168
                    Total vcore-seconds taken by all reduce tasks=2931
                    Total megabyte-seconds taken by all map tasks=4866048
                    Total megabyte-seconds taken by all reduce tasks=9004032
            Map-Reduce Framework
                    Map input records=22
                    Map output records=21
                    Map output bytes=378
                    Map output materialized bytes=86
                    Input split bytes=126
                    Combine input records=21
                    Combine output records=4
                    Reduce input groups=4
                    Reduce shuffle bytes=86
                    Reduce input records=4
                    Reduce output records=4
                    Spilled Records=8
                    Shuffled Maps =1
                    Failed Shuffles=0
                    Merged Map outputs=1
                    GC time elapsed (ms)=57
                    CPU time spent (ms)=1610
                    Physical memory (bytes) snapshot=1070522368
                    Virtual memory (bytes) snapshot=5298221056
                    Total committed heap usage (bytes)=1404043264
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters
                    Bytes Read=358
            File Output Format Counters
                    Bytes Written=49
    Sortie

    hduser@stargate:~/mapreduce$ hadoop fs -ls /tmp/mapreduce/output/
    Found 2 items
    -rw-r--r-- 1 hduser supergroup 0 2015-06-15 23:48 /tmp/mapreduce/output/_SUCCESS
    -rw-r--r-- 1 hduser supergroup 49 2015-06-15 23:48 /tmp/mapreduce/output/part-r-00000

    fichier du resultat
    hduser@stargate:~/mapreduce$ hadoop fs -cat /tmp/mapreduce/output/part-r-00000
    C. Martin 0
    JP Durand 21
    P dupond 9
    S DeSuza 7

    hduser@stargate:~/mapreduce$

    Il va falloir que je regarde comment je peux filtrer/trier

    Affaire à suivre

  16. #16
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    Bon, je me heurte à un nouveau mur, mes map reduce fonctionnent bien quand ils sont exécutés directement sur le cluster
    et maintenant j'essaye d'executer en client sous eclipse un map reduce, je plante en classe not found

    j'ai vu des plugin, mais cela me plait a moitier, cela exige une installation locale de hadoop, or je suis sur un window 7 en mode client en travail distant de mon cluster linux.

    j'établi bien la communication avec le cluster hdfs/yarn, il voit mes fichiers, il fait la création de mon dossier de résultat, et paf!

    je plante en class not found sur la classe locale mapper de ma clase principale, gné! pourtant elle sont définies en statique à l'intérieur de ma classe principale ? bizarre le truc.

    quelque chose m'échappe, j'ai essayé de mettre dans des classes séparées, je prends la même claque.

    A première vue, je croyais pouvoir envoyer sur mon cluster linux mes classes sérialisés, en utilisant le job.setJarByClass(QuizScore.class); qui contient mes classes mapper/reducer
    elle sont pourtant définies en statique et publiques, donc visibles de l'extérieur.

    J'espère ne pas devoir envoyer le jar sur le cluster, parce que ca serait pas très commode comme utilisation.

    Job job = Job.getInstance(conf, "quizscoring");

    job.setJarByClass(QuizScore.class);
    // job.setJar("quizScore.jar");

    job.setJobName("quizscoring");

    job.setUser("hduser");
    job.setNumReduceTasks(4);

    job.setMapperClass(QuizMapper.class);
    // job.setCombinerClass(QuizReducer.class);
    job.setReducerClass(QuizReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(LongWritable.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));

    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    boolean result = job.waitForCompletion(true);
    System.out.println("End result=" + result);
    arg=/usr/hadoop/
    arg=/usr/hadoop/result
    log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration.deprecation).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class jbetest.QuizScore$QuizMapper not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:742)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
    Caused by: java.lang.ClassNotFoundException: Class jbetest.QuizScore$QuizMapper not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
    ... 8 more
    soit c'est un truc que j'ai pas compris pour ce qui concerne un client, ou j'ai un problème de conf quelque part

    Une idée, comme d'hab, il va falloir que je cherche pour savoir d'ou vient ce truc, bon je reviens quand j'aurais trouvé.


    Affaire à suivre

  17. #17
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    Me revoila

    gné ! c'est bien ce que je craignais, faut que je lui fasse manger le jar, sinon il est pas content, grouink ! il ne semble pas capable de se contenter seulement des classes, dommage,
    mais cela peut se comprendre si on doit lui fournir d'autres classes en plus de l'implémentation des classes du map reduce.

    Pour eclipse, il faut impérativement ajouter le jar du projet en plus de classes mapper,reducer dans la config du job. lors de l'exécution il faut faire attention aux permissions
    des dossiers hadoop, sinon cela sera rejetté en permission denied

    la fameuse commande en hard coded dans la description du job, reste plus qu'a rendre cela plus paramétrable. sans ça mon client ne marche pas.
    je n'utilise pas de plugin, c'est du brut direct.

    job.setJar("/Users/bordi/.m2/repository/QuizScore/QuizScore/0.0.1-SNAPSHOT/QuizScore-0.0.1-SNAPSHOT.jar");

    j'ai mis un seul reducer, sinon il réparti par autant de fichier de résultat que de reducer, la je n'ai qu'un fichier de résultat*
    mon fichier input sur hdfs

    hduser@stargate:/usr/local/hadoop/logs$ hadoop fs -cat /tmp/mapreduce/input/quizcandidat.csv
    1:S DeSuza:Q4:0
    2:JP Durand:Q2:2
    3 dupond:Q3:0
    2:JP Durand:Q4:1
    2:JP Durand:Q5:5
    3:C. Martin:Q1:0
    2:JP Durand:Q2:4
    2:JP Durand:Q3:1
    1:S DeSuza:Q1:1
    2:JP Durand:Q4:0
    2:JP Durand:Q5:1
    3 dupond:Q1:2
    3 dupond:Q2:3
    2:JP Durand:Q3:0
    3 dupond:Q4:1
    3 dupond:Q5:3
    1:S DeSuza:Q2:2
    2:JP Durand:Q2:4
    1:S DeSuza:Q3:3
    1:S DeSuza:Q5:1
    2:JP Durand:Q1:3


    hduser@stargate:/usr/local/hadoop/logs$ hadoop fs -ls /tmp/mapreduce/result
    Found 2 items
    -rw-r--r-- 3 bordi supergroup 0 2015-06-19 17:37 /tmp/mapreduce/result/_SUCCESS
    -rw-r--r-- 3 bordi supergroup 49 2015-06-19 17:37 /tmp/mapreduce/result/part-r-00000

    hduser@stargate:/usr/local/hadoop/logs$ hadoop fs -cat /tmp/mapreduce/result/part-r-00000
    C. Martin 0
    JP Durand 21
    P dupond 9
    S DeSuza 7


    voila le code pour executer dans eclipse W7 en client sur un cluster linux.
    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    package jbetest;
    
    import java.io.IOException;
    
    import java.util.StringTokenizer;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.Reducer.Context;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
    
    public class QuizScore {
    
    	public static class QuizMapper extends
    			Mapper<LongWritable, Text, Text, LongWritable> {
    
    		private Text word = new Text();
    
    		private LongWritable scoreQ = new LongWritable();
    
    		@Override
    		public void map(LongWritable key, Text value, Context context)
    				throws IOException, InterruptedException {
    
    			if (value.getLength() > 0) {
    				String[] line = value.toString().split(":");
    				try {
    					word.set(line[1]);
    					scoreQ.set(Long.parseLong(line[3]));
    
    					context.write(word, scoreQ);
    				} catch (NumberFormatException e) {
    					// cannot parse - ignore
    				}
    			}
    		}
    	}
    
    	public static class QuizReducer<KEY> extends
    			Reducer<KEY, LongWritable, KEY, LongWritable> {
    
    		private LongWritable result = new LongWritable();
    
    		public void reduce(KEY key, Iterable<LongWritable> values,
    				Context context) throws IOException, InterruptedException {
    			long sum = 0;
    			for (LongWritable val : values) {
    				sum += val.get();
    			}
    			result.set(sum);		
    			context.write(key, result);
    		}
    
    	}
    
    	public static void main(String[] args) throws Exception {
    
    		if (args.length != 2) {
    			System.err.println("Usage: QuizScore  <input path> <output path>");
    			System.exit(-1);
    		}
    
    		for (String arg : args) {
    			System.out.println("arg=" + arg);
    		}
    
    		Configuration conf = new Configuration();
    
    		// conf.set("fs.defaultFS", "hdfs://192.168.0.11:9000");
    
    		conf.set("fs.default.name", "hdfs://192.168.0.11:9000");
    		conf.set("yarn.resourcemanager.scheduler.address", "192.168.0.11:8030");
    		conf.set("yarn.resourcemanager.resource-tracker.address",
    				"192.168.0.11:8031");
    		conf.set("yarn.resourcemanager.address", "192.168.0.11:8032");
    
    		conf.set("mapreduce.jobhistory.address", "192.168.0.11:10020");
    
    		conf.set("mapred.job.tracker", "192.168.0.11:54311");
    
    		conf.set("mapreduce.framework.name", "yarn");
    
    		conf.set("mapreduce.app-submission.cross-platform", "true");
    		conf.set("hadoop.job.ugi", "hduser");
    
    
    		Job job = Job.getInstance(conf, "quizscoring");
    
    		job.setJarByClass(QuizScore.class); // pas suffisant en soi pour le remote, il faut utiliser aussi setjar
    	
    		job.setJobName("quizscoring");
    
    		job.setUser("hduser");
    		job.setNumReduceTasks(1);
    
    		job.setMapperClass(QuizMapper.class);
    		job.setCombinerClass(QuizReducer.class);
    		job.setReducerClass(QuizReducer.class);
    
    		job.setOutputKeyClass(Text.class);
    		job.setOutputValueClass(LongWritable.class);
    		
    		job.setMapOutputKeyClass(Text.class);
    		job.setMapOutputValueClass(LongWritable.class);
    		
    		job.setInputFormatClass(TextInputFormat.class);
    		job.setOutputFormatClass(TextOutputFormat.class);
    		job.setJar("/Users/bordi/.m2/repository/QuizScore/QuizScore/0.0.1-SNAPSHOT/QuizScore-0.0.1-SNAPSHOT.jar"); // pas cool, mais vraiement pas cool ce truc.
    			
    		FileInputFormat.addInputPath(job, new Path(args[0]));
    
    		FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
    		boolean result = job.waitForCompletion(true);
    		System.out.println("End result=" + result);
    		System.exit(result ? 0 : 1);
    
    	}
    
    }
    job history

    http://192.168.0.11:19888/jobhistory/app

    2015.06.19 17:36:36 CEST 2015.06.19 17:36:40 CEST 2015.06.19 17:37:04 CEST job_1434718393926_0017 quizscoring bordi default SUCCEEDED 1 1 1 1


    Autre code mais qui utilise Tool runner, pour voir traiter le job via workflow oozie et pas seulement sur le jobhistory

    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    
    
    import java.security.PrivilegedExceptionAction;
    
    import jbetest.QuizScore.QuizMapper;
    import jbetest.QuizScore.QuizReducer;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
    import org.apache.hadoop.security.UserGroupInformation;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;
    
    
    public class QuizScoreJob extends Configured implements Tool{
    
    	 static Configuration conf = new Configuration();
    	 
        public static void main(String[] args) throws Exception {
        	
           
            
            int res = ToolRunner.run(conf, new QuizScoreJob(), args);
            System.exit(res);       
        }
    
        
        public int run(String[] args) throws Exception {
    
        		try {
        				UserGroupInformation ugi
        				= UserGroupInformation.createRemoteUser("hduser");
      		      	
    			    	ugi.doAs(new PrivilegedExceptionAction<Void>() {
    			
    			        	public Void run() throws Exception {
    			        		
    			        	System.out.println("debut");
    			        			
    			        	
    					        String input="/tmp/mapreduce/input/quizcandidat.csv";
    					        String output="/tmp/mapreduce/result2";
    					        
    
    					        conf.set("fs.default.name", "hdfs://192.168.0.11:9000");
    							conf.set("yarn.resourcemanager.scheduler.address", "192.168.0.11:8030");
    							conf.set("yarn.resourcemanager.resource-tracker.address",
    									"192.168.0.11:8031");
    							conf.set("yarn.resourcemanager.address", "192.168.0.11:8032");
    
    							conf.set("mapreduce.jobhistory.address", "192.168.0.11:10020");
    
    							conf.set("mapred.job.tracker", "192.168.0.11:54311");
    
    							conf.set("mapreduce.framework.name", "yarn");
    
    							conf.set("mapreduce.app-submission.cross-platform", "true");
    							conf.set("hadoop.job.ugi", "hduser");
    
    					  
    					        Job job =  Job.getInstance(conf);
    					        job.setJarByClass(QuizScoreJob.class);
    					        job.setJobName("Job QuizScore");
    					        
    					    	job.setUser("hduser");
    							job.setNumReduceTasks(1);
    
    							job.setMapperClass(QuizMapper.class);
    							job.setCombinerClass(QuizReducer.class);
    							job.setReducerClass(QuizReducer.class);
    
    							job.setOutputKeyClass(Text.class);
    							job.setOutputValueClass(LongWritable.class);
    							
    							job.setMapOutputKeyClass(Text.class);
    							job.setMapOutputValueClass(LongWritable.class);
    							
    							job.setInputFormatClass(TextInputFormat.class);
    							job.setOutputFormatClass(TextOutputFormat.class);
    							
    					        // Job Input path
    					        FileInputFormat.addInputPath(job, new  
    					        Path(input)); 	
    					        // Job Output path
    					        FileOutputFormat.setOutputPath(job, new 
    					        Path(output)); 
    					        job.setJar("/Users/bordi/.m2/repository/QuizScore/QuizScore/0.0.1-SNAPSHOT/QuizScore-0.0.1-SNAPSHOT.jar");
    				
    					        System.out.println("call submit");
    					
    					        boolean bool=job.waitForCompletion(true);
    					        
    					        System.out.println(job.getSchedulingInfo());
    					        
    					        System.out.println("find boolean job="+bool);
    							return null; /// return run
    						      
    					        }} );// run
    			     	
    			              
    			    } catch (Exception e) {
    			             e.printStackTrace();
    			    }
    
        		return 0;
        }
    }
    chose importante, voila les dependencies que j'utilise pour executer mon projet, le reste est standard

    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    
    
    	<dependencies>
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-common</artifactId>
    			<version>2.6.0</version>
    		</dependency>
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-mapreduce-client-core</artifactId>
    			<version>2.6.0</version>
    		</dependency>
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
    			<version>2.6.0</version>
    		</dependency>
    			<dependency>
    				<groupId>org.apache.hadoop</groupId>
    				<artifactId>hadoop-mapreduce-client-shuffle</artifactId>
    				<version>2.6.0</version>
    			</dependency>
    
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-hdfs</artifactId>
    			<version>2.6.0</version>
    		</dependency>
    
    	</dependencies>
    Affaire à suivre

  18. #18
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    Utilisation sous eclipse du hive-jdbc et l'envoi de de requete au serveur hive
    ca marche pas mal, c'est plus facile qu'avec le map reduce.


    attention à la cohérence des versions, hadoop ne pardonne pas.

    Hive version 0.14
    hive jdbc driver 1.1.1 (org.apache.hive.jdbc.HiveDriver)


    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    
    public class TestHiveRemote {
    	private static String driverName = "org.apache.hive.jdbc.HiveDriver";
    
    	public static void main(String[] args) throws SQLException {
    
    		try {
    			// Register driver and create driver instance
    			Class.forName(driverName);
    		} catch (ClassNotFoundException ex) {
    
    		}
    
    		// get connection
    		System.out.println("before trying to connect");
    		Connection con = DriverManager.getConnection(
    				"jdbc:hive2://192.168.0.11:10000/jbedb", "hduser", "servus");
    		System.out.println("connected");
    
    		// create statement
    		Statement stmt = con.createStatement();
    		// show tables
    
    		drop_table_consultant(stmt);
    		
    		create_table_consultant(stmt);
    		
    		describe_table(stmt);
    		
    		load_data_consultant(stmt);
    		
    		select_consultant(stmt);
    		
    		con.close();
    		System.out.println("=============================");
    		System.out.println("fini");
    
    	}
    
    	static void show_tables(Statement stmt) throws SQLException {
    
    		String sql = "show tables";
    		System.out.println("Running: " + sql);
    		ResultSet res = stmt.executeQuery(sql);
    
    		while (res.next()) {
    
    			System.out.println("str=" + res.getString("tab_name"));
    		}
    
    	}
    	static void drop_table_consultant( Statement stmt) throws SQLException {
    		System.out.println("===========DROP==============");
    		// execute statement
    		stmt.execute("DROP TABLE IF EXISTS "
    				+ " consultant");
    
    		System.out.println("Table employee droped.");
    	}
    	static void create_table_consultant( Statement stmt) throws SQLException {
    		System.out.println("===========CREATE============");
    		// execute statement
    		stmt.execute("CREATE TABLE IF NOT EXISTS "
    				+ " consultant ( eid int, nom String, salaire String, job String, serviceId String )"
    				+ " ROW FORMAT DELIMITED" + " FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'" 
    				+ " STORED AS TEXTFILE ");
    
    		System.out.println("Table employee created.");
    	}
    	
    	
    	static void describe_table(Statement stmt) throws SQLException {
    
    		System.out.println("=========DESCRIBE============");
    		String sql = "desc consultant";
    		System.out.println("describe : " + sql);
    		ResultSet res = stmt.executeQuery(sql);
    
    		while (res.next()) {
    
    			System.out.println(res.getString("col_name")+" "+res.getString("data_type"));
    		}
    
    	}
    	
    	static void load_data_consultant( Statement stmt) throws SQLException {
    		System.out.println("===========LOAD==============");
    		stmt.execute("LOAD DATA LOCAL INPATH '/tmp/consultant.txt' INTO TABLE  consultant");
    	}
    	
    	static void select_consultant( Statement stmt) throws SQLException {
    		System.out.println("===========SELECT============");
    		System.out.println("list consultant");
    		ResultSet res=stmt.executeQuery("SELECT * FROM consultant");
    		
    		while (res.next() ) {
    			
    			System.out.println( res.getInt("eid")+" "+res.getString("nom")+" "+res.getString("salaire")+" "+res.getString("job"));
    		}
    	}
    }
    dependencies
    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    <dependencies>
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-common</artifactId>
    			<version>2.6.0</version>
    		</dependency>
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-mapreduce-client-core</artifactId>
    			<version>2.6.0</version>
    		</dependency>
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
    			<version>2.6.0</version>
    		</dependency>
    			<dependency>
    				<groupId>org.apache.hadoop</groupId>
    				<artifactId>hadoop-mapreduce-client-shuffle</artifactId>
    				<version>2.6.0</version>
    			</dependency>
    
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-hdfs</artifactId>
    			<version>2.6.0</version>
    		</dependency>
    		
    		<dependency>
    			<groupId>org.apache.hive</groupId>
    			<artifactId>hive-jdbc</artifactId>
    			<version>1.1.1</version>
    		</dependency>
    Résultat console
    connected
    ===========DROP==============
    Table employee droped.
    ===========CREATE============
    Table employee created.
    =========DESCRIBE============
    describe : desc consultant
    eid int
    nom string
    salaire string
    job string
    serviceid string
    ===========LOAD==============
    ===========SELECT============
    list consultant
    1 S.Dupont 450000 Developpeur
    1 J.Milou 350000 integrateur
    1 V.Marin 370000 Manager
    =============================
    fini

    Reste à voir pig et hbase

    Affaire à suivre

  19. #19
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    Comme vous vous en doutez, je travaille avec eclipse en mode distant sur mon cluster linux, cela fonctionne bien avec
    les tâches java map reduce, les tâches java avec hive jdbc.

    Maintenant je travaille sur les jobs pig sous eclipse et c'est un peu plus compliqué, faut pas mal bidouiller.

    d'abord il faut compiler pig selon la version hadoop avec lequel on travaille, sinon il considère qu'il est en hadoop1
    et cela fait des chocapics à l'exécution de mon programme, les interfaces sont incompatibles.

    on peut voir les erreurs en ajoutant le log4j.properties sinon on ne voit rien du tout.

    il faut récupérer les sources sous windows, et compiler à la racine du projet
    avec la commande, il peut générer différent jar selon les versions

    ant hadoopversion=23 jar

    ensuite il faut récupérer le pig snapshot et je l'ai installé manuellement dans mon m2 pour ce qui me concerne
    et j'ai ajouté ma dépendance dans mon pom.xml

    extrait pom.xml
    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    <dependencies>
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-common</artifactId>
    			<version>2.6.0</version>
    		</dependency>
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-mapreduce-client-core</artifactId>
    			<version>2.6.0</version>
    		</dependency>
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
    			<version>2.6.0</version>
    		</dependency>
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-mapreduce-client-shuffle</artifactId>
    			<version>2.6.0</version>
    		</dependency>
    
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-hdfs</artifactId>
    			<version>2.6.0</version>
    		</dependency>
    
    		<dependency>
    			<groupId>org.apache.hive</groupId>
    			<artifactId>hive-jdbc</artifactId>
    			<version>1.1.1</version>
    		</dependency>
    
    		<dependency>
    			<groupId>org.apache.pig</groupId>
    			<artifactId>pig</artifactId>
    			<version>0.14.0-SNAPSHOT</version>
    		</dependency>
    
    		<dependency>
    			<groupId>joda-time</groupId>
    			<artifactId>joda-time</artifactId>
    			<version>2.8.1</version>
    		</dependency>
    
    		<dependency>
    			<groupId>dk.brics.automaton</groupId>
    			<artifactId>automaton</artifactId>
    			<version>1.11-8</version>
    		</dependency>
    
    	</dependencies>
    A noter, j'utilise la version pig 0.14, et ca plante sur une dépendance jetty, cela a été corriger sur la version pig 0.15,
    mais j'ai du reporter le changement dans le build.xml de la 15 vers la 14, comparant les fichiers, ensuite
    J'ai pu produire le jar 0.14.0-SNAPSHOT.

    Maintenant que j'ai réglé ce problème j'ai pu lancer enfin un job distant sur mon cluster avec yarn, nouvelle tuile.

    à noter l'application history voit maintenant mon job. ça progresse, avant je n'arrivait pas a lancer
    la tâche dans le cluster. pig semble plus exigeant en mode remote que le map reduce.

    bizarre, il manque quelque chose côté hadoop.

    log du cluster, yarn log, ca plante sur fg ligne 0, pas de contrôle de tâche

    2015-06-21 11:48:52,718 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Attempt appattempt_1434868270867_0003_000002 is done. finalState=FAILED
    2015-06-21 11:48:52,718 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1434868270867_0003 failed 2 times due to AM Container for appattempt_1434868270867_0003_000002 exited with exitCode: 1
    For more detailed output, check application tracking page:http://stargate:8088/proxy/applicati...0867_0003/Then, click on links to logs of each attempt.
    Diagnostics: Exception from container-launch.
    Container id: container_1434868270867_0003_02_000001
    Exit code: 1
    Exception message: /bin/bash: ligne 0 : fg: pas de contrôle de tâche

    Stack trace: ExitCodeException exitCode=1: /bin/bash: ligne 0 : fg: pas de contrôle de tâche

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)


    Container exited with a non-zero exit code 1
    Failing this attempt. Failing the application.
    2015-06-21 11:48:52,719 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1434868270867_0003 requests cleared
    Mon code
    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    import java.io.IOException;
    
    import java.util.List;
    import java.util.Properties;
    
    import org.apache.pig.ExecType;
    import org.apache.pig.PigServer;
    import org.apache.pig.backend.executionengine.ExecException;
    import org.apache.pig.backend.executionengine.ExecJob;
    	
    public class TestPigRemote {
    
    	public static class idmapreduce{
    	   public static void main(String[] args) throws IOException {
    	   
    	   System.out.println("début");
    		
    		 Properties props = new Properties();
    		 props.setProperty("fs.default.name", "hdfs://192.168.0.11:9000");
    		 props.setProperty("mapred.job.tracker", "192.168.0.11:54311");
    	
    		 
    		 PigServer pigServer = new PigServer(ExecType.MAPREDUCE, props);
    		  
    		  
    	   try {
    		   
    	   
    		  runIdQueryBatch(pigServer, "/tmp/consultant.txt");
    		
    	     Thread.sleep(30000);
    
    	   }
    	   catch(Exception e) {	
    		   e.printStackTrace();
    	   }
    	   if (pigServer.isBatchOn() ) {
    		   pigServer.shutdown();
    	   }
    	   
    	   System.out.println("fin");
    	}
    	   
    
    	   
    	public static void runIdQueryBatch(PigServer pigServer, String inputFile) throws IOException {
    
    		System.out.println("traitement lancement");
    		pigServer.setJobName("test pig remoting");
    		
    		 pigServer.setBatchOn();
    		    pigServer.debugOn();
    		    pigServer.setValidateEachStatement(true);
    
    	     runIdQuery( pigServer,  inputFile);
    	     
    	     List<ExecJob> jobs=pigServer.executeBatch(true);
    	     
    	     System.out.println("wait end job complete");
    	     for ( ExecJob job : jobs  ) {
    	    	 
    		     while ( job.hasCompleted() == false ) {
    		    	 try {
    					Thread.sleep(10000);
    				} catch (InterruptedException e) {
    					continue;
    				}
    		    	
    		     }
    
    	    	 System.out.println("job="+job.getStatus());
    	    	 System.out.println("job="+job.getConfiguration());
    	    	 System.out.println("job="+job.hasCompleted());
    	    	 
    	    	 if (  job.getException() != null ) {
    	    		 job.getException().printStackTrace();
    	    	 }
    	    	 
    	     }
    
    	     
    	     System.out.println("traitement terminé");
    	   }
    	}
    	
    	   
    	   public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException {
    			pigServer.setValidateEachStatement(true);
    		     pigServer.registerQuery("A = load '"+inputFile+"' using PigStorage(',');");
    		     pigServer.registerQuery("B = foreach A generate $0 as id;");
    		     pigServer.store("B", "/tmp/idout.txt");
    		   
    		}
    }
    eclipse console log après execution de programme.

    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    début
    log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration.deprecation).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    2    [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting to hadoop file system at: hdfs://192.168.0.11:9000
    711  [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting to map-reduce job tracker at: 192.168.0.11:54311
    traitement lancement
    854  [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Original macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))))
    
    854  [main] DEBUG org.apache.pig.parser.QueryParserDriver  - macro AST after import:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))))
    
    854  [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Resulting macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))))
    
    5672 [main] DEBUG org.apache.pig.builtin.JsonMetadata  - Could not find schema file for /tmp/consultant.txt
    5695 [main] DEBUG org.apache.pig.JVMReuseImpl  - Method cleanupStaticData in class class org.apache.pig.impl.util.UDFContext registered for static data cleanup
    5696 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Original macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))))
    
    5696 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - macro AST after import:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))))
    
    5696 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Resulting macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))))
    
    5708 [main] DEBUG org.apache.pig.builtin.JsonMetadata  - Could not find schema file for /tmp/consultant.txt
    5713 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Original macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    5713 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - macro AST after import:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    5713 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Resulting macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    5731 [main] DEBUG org.apache.pig.builtin.JsonMetadata  - Could not find schema file for /tmp/consultant.txt
    5758 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Original macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    5758 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - macro AST after import:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    5759 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Resulting macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    5770 [main] DEBUG org.apache.pig.builtin.JsonMetadata  - Could not find schema file for /tmp/consultant.txt
    5771 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Original macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    5771 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - macro AST after import:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    5771 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Resulting macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    5782 [main] DEBUG org.apache.pig.builtin.JsonMetadata  - Could not find schema file for /tmp/consultant.txt
    5795 [main] INFO  org.apache.pig.tools.pigstats.ScriptState  - Pig features used in the script: UNKNOWN
    5832 [main] INFO  org.apache.pig.data.SchemaTupleBackend  - Key [pig.schematuple] was not set... will not generate code.
    5858 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer  - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
    5903 [main] DEBUG org.apache.pig.JVMReuseImpl  - Method staticDataCleanup in class class org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator registered for static data cleanup
    5922 [main] DEBUG org.apache.pig.impl.util.SpillableMemoryManager  - Found heap (Code Cache) of type Non-heap memory
    5922 [main] DEBUG org.apache.pig.impl.util.SpillableMemoryManager  - Found heap (PS Eden Space) of type Heap memory
    5923 [main] DEBUG org.apache.pig.impl.util.SpillableMemoryManager  - Found heap (PS Survivor Space) of type Heap memory
    5923 [main] DEBUG org.apache.pig.impl.util.SpillableMemoryManager  - Found heap (PS Old Gen) of type Heap memory
    5923 [main] DEBUG org.apache.pig.impl.util.SpillableMemoryManager  - Found heap (PS Perm Gen) of type Non-heap memory
    5923 [main] DEBUG org.apache.pig.impl.util.SpillableMemoryManager  - Selected heap to monitor (PS Old Gen)
    5925 [main] DEBUG org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema  - t: 50 Bag: 120 tuple: 110
    5927 [main] DEBUG org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema  - t: 50 Bag: 120 tuple: 110
    5929 [main] DEBUG org.apache.pig.data.SchemaTupleFrontend  - Registering Schema for generation [{bytearray}] with id [0] and context: FOREACH
    5932 [main] DEBUG org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema  - t: 50 Bag: 120 tuple: 110
    5950 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler  - File concatenation threshold: 100 optimistic? false
    5962 [main] DEBUG org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer  - Not a sampling job.
    5967 [main] DEBUG org.apache.pig.backend.hadoop.executionengine.util.SecondaryKeyOptimizerUtil  - Cannot find POLocalRearrange or POUnion in map leaf, skip secondary key optimizing
    5974 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size before optimization: 1
    5974 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size after optimization: 1
    6182 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState  - Pig script settings are added to the job
    6187 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    6189 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - This job cannot be converted run in-process
    6528 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Added jar file:/C:/Users/bordi/.m2/repository/org/apache/pig/pig/0.14.0-SNAPSHOT/pig-0.14.0-SNAPSHOT.jar to DistributedCache through /tmp/temp625569534/tmp-921934698/pig-0.14.0-SNAPSHOT.jar
    6566 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Added jar file:/C:/Users/bordi/.m2/repository/dk/brics/automaton/automaton/1.11-8/automaton-1.11-8.jar to DistributedCache through /tmp/temp625569534/tmp1321168983/automaton-1.11-8.jar
    6599 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Added jar file:/C:/Users/bordi/.m2/repository/org/antlr/antlr-runtime/3.4/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp625569534/tmp180981958/antlr-runtime-3.4.jar
    6649 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Added jar file:/C:/Users/bordi/.m2/repository/com/google/guava/guava/11.0.2/guava-11.0.2.jar to DistributedCache through /tmp/temp625569534/tmp484504265/guava-11.0.2.jar
    6691 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Added jar file:/C:/Users/bordi/.m2/repository/joda-time/joda-time/2.8.1/joda-time-2.8.1.jar to DistributedCache through /tmp/temp625569534/tmp512577559/joda-time-2.8.1.jar
    6739 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Setting up single store job
    6744 [main] DEBUG org.apache.pig.data.SchemaTupleFrontend  - Temporary directory for generated code created: C:\Users\bordi\AppData\Local\Temp\1434884627289-0
    6744 [main] INFO  org.apache.pig.data.SchemaTupleFrontend  - Key [pig.schematuple] is false, will not generate code.
    6744 [main] INFO  org.apache.pig.data.SchemaTupleFrontend  - Starting process to move generated code to distributed cacche
    6745 [main] INFO  org.apache.pig.data.SchemaTupleFrontend  - Setting key [pig.schematuple.classes] with classes to deserialize []
    6769 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 1 map-reduce job(s) waiting for submission.
    6770 [JobControl] DEBUG org.apache.pig.backend.hadoop23.PigJobControl  - Checking state of job job name:	
    job id:	job_pigexec_0
    job state:	WAITING
    job mapred id:	null
    job message:	just initialized
    job has no depending job:	
    
    7026 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil  - Total input paths to process : 1
    7040 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil  - Total input paths (combined) to process : 1
    7463 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - HadoopJobId: job_1434868270867_0006
    7464 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Processing aliases A,B
    7464 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - detailed locations: M: A[1,4],B[1,57] C:  R: 
    7470 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 0% complete
    7470 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Running jobs are [job_1434868270867_0006]
    12463 [JobControl] DEBUG org.apache.pig.backend.hadoop23.PigJobControl  - Checking state of job job name:	N/A
    job id:	job_pigexec_0
    job state:	RUNNING
    job mapred id:	job_1434868270867_0006
    job message:	just initialized
    job has no depending job:	
    
    12480 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
    12480 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - job job_1434868270867_0006 has failed! Stop running all dependent jobs
    12480 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 100% complete
    12554 [main] DEBUG org.apache.pig.tools.pigstats.PigStats  - unable to set backend exception
    12554 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil  - 1 map reduce job(s) failed!
    12559 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats  - Script Statistics: 
    
    HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
    2.6.0	0.14.0-SNAPSHOT	bordi	2015-06-21 13:03:46	2015-06-21 13:03:53	UNKNOWN
    
    Failed!
    
    Failed Jobs:
    JobId	Alias	Feature	Message	Outputs
    job_1434868270867_0006	A,B	MAP_ONLY	Message: Job failed!	/tmp/idout.txt,
    
    Input(s):
    Failed to read data from "/tmp/consultant.txt"
    
    Output(s):
    Failed to produce result in "/tmp/idout.txt"
    
    Counters:
    Total records written : 0
    Total bytes written : 0
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0
    
    Job DAG:
    job_1434868270867_0006
    
    
    12559 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Failed!
    12562 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Original macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    12562 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - macro AST after import:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    12562 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Resulting macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    12569 [main] DEBUG org.apache.pig.builtin.JsonMetadata  - Could not find schema file for /tmp/consultant.txt
    wait end job complete
    job=FAILED
    job=true
    traitement terminé
    fin
    42606 [Thread-0] DEBUG org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Receive kill signal
    Je sens que je ne suis pas loin, va falloir chercher, mais j'ai l'habitude, je reviens

    Affaire à suivre

  20. #20
    Membre habitué
    Homme Profil pro
    Inscrit en
    Octobre 2007
    Messages
    190
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Localisation : France

    Informations forums :
    Inscription : Octobre 2007
    Messages : 190
    Points : 182
    Points
    182
    Par défaut
    Me revoilà

    Bon, j'ai fini par trouver, pas si simple à expliquer dans le cadre de job pig. c'est bien un problème de configuration
    mais à définir correctement du côté client et à reporter côté cluster, notament pour le classpath, j'avais rencontré le problème
    avec le java map reduce en remoting, la il faut ajouter plus d'info pour les job pig en remoting.

    D'abord il faut avoir en local les fichiers de configuration du cluster ou l'on souhaite se connecter

    core-site.xml
    hdfs-site.xml
    log4j.properties
    mapred-site.xml
    yarn-site.xml

    Mais il faut ajouter quelques éléments supplémentaire dans les config locale/cluster pour permettre d'executer des tâches distant dans le cluster

    mapred-site.xml

    il faut ajouter le classpath (local/cluster) et le support cross plateforme (client local)

    <property>
    <name>mapreduce.application.classpath</name>
    <value>
    /usr/local/hadoop/etc/hadoop/*,
    /usr/local/hadoop/share/hadoop/common/*,
    /usr/local/hadoop/share/hadoop/common/lib/*,
    /usr/local/hadoop/share/hadoop/hdfs/*,
    /usr/local/hadoop/share/hadoop/hdfs/lib/*,
    /usr/local/hadoop/share/hadoop/mapreduce/*,
    /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
    /usr/local/hadoop/share/hadoop/yarn/*,
    /usr/local/hadoop/share/hadoop/yarn/lib/*
    </value>
    </property>

    <property>
    <name>mapred.remote.os</name>
    <value>Linux</value>
    <description>Remote MapReduce framework's OS, can be either Linux or
    Windows
    </description>
    </property>

    <property>
    <name>mapreduce.app-submission.cross-platform</name>
    <value>true</value>
    </property>
    yarn-site.xml

    il faut ajouter le classpath

    <property>
    <name>yarn.application.classpath</name>
    <value>
    /usr/local/hadoop/etc/hadoop/*,
    /usr/local/hadoop/share/hadoop/common/*,
    /usr/local/hadoop/share/hadoop/common/lib/*,
    /usr/local/hadoop/share/hadoop/hdfs/*,
    /usr/local/hadoop/share/hadoop/hdfs/lib/*,
    /usr/local/hadoop/share/hadoop/mapreduce/*,
    /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
    /usr/local/hadoop/share/hadoop/yarn/*,
    /usr/local/hadoop/share/hadoop/yarn/lib/*
    </value>
    </property>
    </configuration>
    Quand le conteneur yarn s'execute, il doit retrouver les jars qu'il a besoin.

    Il faut faire attention au démarrage du job history dans le cluster sur tous les noeuds, sinon les tâches se plantent

    voila mon code, un petit peu modifier

    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    	import java.io.IOException;
    	
    	import java.util.List;
    	import java.util.Properties;
    	
    	import org.apache.pig.ExecType;
    	import org.apache.pig.PigServer;
    	import org.apache.pig.backend.executionengine.ExecException;
    	import org.apache.pig.backend.executionengine.ExecJob;
    		
    	public class TestPigRemote {
    	
    		public static class idmapreduce{
    		   public static void main(String[] args) throws IOException {
    		   
    		   System.out.println("début");
    			
    			 Properties props = new Properties();
    			 props.setProperty("fs.default.name", "hdfs://192.168.0.11:9000");
    			 props.setProperty("mapred.job.tracker", "192.168.0.11:54311");
    			 props.setProperty("mapreduce.jobhistory.address", "192.168.0.11:10020");
    		
    			 
    			 PigServer pigServer = new PigServer(ExecType.MAPREDUCE, props);
    			  
    			  
    		   try {
    			   
    		   
    			  runIdQueryBatch(pigServer, "/tmp/consultant.txt");
    			
    		     Thread.sleep(30000);
    	
    		   }
    		   catch(Exception e) {	
    			   e.printStackTrace();
    		   }
    		   if (pigServer.isBatchOn() ) {
    			   pigServer.shutdown();
    		   }
    		   
    		   System.out.println("fin");
    		}
    		   
    	
    		   
    		public static void runIdQueryBatch(PigServer pigServer, String inputFile) throws IOException {
    	
    			System.out.println("traitement lancement");
    			pigServer.setJobName("testPigRemoting");
    			
    			 pigServer.setBatchOn();
    			    pigServer.debugOn();
    			    pigServer.setValidateEachStatement(true);
    	
    		     runIdQuery( pigServer,  inputFile);
    		     
    		     List<ExecJob> jobs=pigServer.executeBatch(true);
    		     
    		     System.out.println("wait end job complete");
    		     for ( ExecJob job : jobs  ) {
    		    	 
    			     while ( job.hasCompleted() == false ) {
    			    	 try {
    						Thread.sleep(10000);
    					} catch (InterruptedException e) {
    						continue;
    					}
    			    	
    			     }
    	
    		    	 System.out.println("job="+job.getStatus());
    		    	 System.out.println("job="+job.getConfiguration());
    		    	 System.out.println("job="+job.hasCompleted());
    		    	 
    		    	 if (  job.getException() != null ) {
    		    		 job.getException().printStackTrace();
    		    	 }
    		    	 
    		     }
    	
    		     
    		     System.out.println("traitement terminé");
    		   }
    		}
    		
    		   
    		   public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException {
    				pigServer.setValidateEachStatement(true);
    			     pigServer.registerQuery("A = load '"+inputFile+"' using PigStorage(',');");
    			     pigServer.registerQuery("B = foreach A generate $0 as id;");
    			     pigServer.store("B", "/tmp/idout");
    			   
    			}
    	}

    logs succes
    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    début
    log4j:WARN No appenders could be found for logger (org.apache.hadoop.conf.Configuration.deprecation).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    2    [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting to hadoop file system at: hdfs://192.168.0.11:9000
    696  [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine  - Connecting to map-reduce job tracker at: 192.168.0.11:54311
    traitement lancement
    838  [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Original macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))))
    
    838  [main] DEBUG org.apache.pig.parser.QueryParserDriver  - macro AST after import:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))))
    
    838  [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Resulting macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))))
    
    6002 [main] DEBUG org.apache.pig.builtin.JsonMetadata  - Could not find schema file for /tmp/consultant.txt
    6025 [main] DEBUG org.apache.pig.JVMReuseImpl  - Method cleanupStaticData in class class org.apache.pig.impl.util.UDFContext registered for static data cleanup
    6025 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Original macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))))
    
    6025 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - macro AST after import:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))))
    
    6025 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Resulting macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))))
    
    6037 [main] DEBUG org.apache.pig.builtin.JsonMetadata  - Could not find schema file for /tmp/consultant.txt
    6043 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Original macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    6043 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - macro AST after import:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    6043 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Resulting macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    6060 [main] DEBUG org.apache.pig.builtin.JsonMetadata  - Could not find schema file for /tmp/consultant.txt
    6087 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Original macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    6087 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - macro AST after import:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    6087 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Resulting macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    6099 [main] DEBUG org.apache.pig.builtin.JsonMetadata  - Could not find schema file for /tmp/consultant.txt
    6100 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Original macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    6100 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - macro AST after import:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    6100 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Resulting macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    6110 [main] DEBUG org.apache.pig.builtin.JsonMetadata  - Could not find schema file for /tmp/consultant.txt
    6123 [main] INFO  org.apache.pig.tools.pigstats.ScriptState  - Pig features used in the script: UNKNOWN
    6160 [main] INFO  org.apache.pig.data.SchemaTupleBackend  - Key [pig.schematuple] was not set... will not generate code.
    6186 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer  - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
    6232 [main] DEBUG org.apache.pig.JVMReuseImpl  - Method staticDataCleanup in class class org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator registered for static data cleanup
    6251 [main] DEBUG org.apache.pig.impl.util.SpillableMemoryManager  - Found heap (Code Cache) of type Non-heap memory
    6251 [main] DEBUG org.apache.pig.impl.util.SpillableMemoryManager  - Found heap (PS Eden Space) of type Heap memory
    6251 [main] DEBUG org.apache.pig.impl.util.SpillableMemoryManager  - Found heap (PS Survivor Space) of type Heap memory
    6251 [main] DEBUG org.apache.pig.impl.util.SpillableMemoryManager  - Found heap (PS Old Gen) of type Heap memory
    6251 [main] DEBUG org.apache.pig.impl.util.SpillableMemoryManager  - Found heap (PS Perm Gen) of type Non-heap memory
    6251 [main] DEBUG org.apache.pig.impl.util.SpillableMemoryManager  - Selected heap to monitor (PS Old Gen)
    6253 [main] DEBUG org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema  - t: 50 Bag: 120 tuple: 110
    6256 [main] DEBUG org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema  - t: 50 Bag: 120 tuple: 110
    6258 [main] DEBUG org.apache.pig.data.SchemaTupleFrontend  - Registering Schema for generation [{bytearray}] with id [0] and context: FOREACH
    6261 [main] DEBUG org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema  - t: 50 Bag: 120 tuple: 110
    6278 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler  - File concatenation threshold: 100 optimistic? false
    6291 [main] DEBUG org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer  - Not a sampling job.
    6296 [main] DEBUG org.apache.pig.backend.hadoop.executionengine.util.SecondaryKeyOptimizerUtil  - Cannot find POLocalRearrange or POUnion in map leaf, skip secondary key optimizing
    6303 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size before optimization: 1
    6303 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer  - MR plan size after optimization: 1
    6511 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState  - Pig script settings are added to the job
    6517 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    6518 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - This job cannot be converted run in-process
    6788 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Added jar file:/C:/Users/bordi/.m2/repository/org/apache/pig/pig/0.14.0-SNAPSHOT/pig-0.14.0-SNAPSHOT.jar to DistributedCache through /tmp/temp321181890/tmp-517751113/pig-0.14.0-SNAPSHOT.jar
    6826 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Added jar file:/C:/Users/bordi/.m2/repository/dk/brics/automaton/automaton/1.11-8/automaton-1.11-8.jar to DistributedCache through /tmp/temp321181890/tmp-488374958/automaton-1.11-8.jar
    6859 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Added jar file:/C:/Users/bordi/.m2/repository/org/antlr/antlr-runtime/3.4/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp321181890/tmp-1730105890/antlr-runtime-3.4.jar
    6917 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Added jar file:/C:/Users/bordi/.m2/repository/com/google/guava/guava/11.0.2/guava-11.0.2.jar to DistributedCache through /tmp/temp321181890/tmp796875788/guava-11.0.2.jar
    6959 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Added jar file:/C:/Users/bordi/.m2/repository/joda-time/joda-time/2.8.1/joda-time-2.8.1.jar to DistributedCache through /tmp/temp321181890/tmp436826936/joda-time-2.8.1.jar
    6991 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler  - Setting up single store job
    6997 [main] DEBUG org.apache.pig.data.SchemaTupleFrontend  - Temporary directory for generated code created: C:\Users\bordi\AppData\Local\Temp\1434888731852-0
    6997 [main] INFO  org.apache.pig.data.SchemaTupleFrontend  - Key [pig.schematuple] is false, will not generate code.
    6997 [main] INFO  org.apache.pig.data.SchemaTupleFrontend  - Starting process to move generated code to distributed cacche
    6997 [main] INFO  org.apache.pig.data.SchemaTupleFrontend  - Setting key [pig.schematuple.classes] with classes to deserialize []
    7021 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 1 map-reduce job(s) waiting for submission.
    7023 [JobControl] DEBUG org.apache.pig.backend.hadoop23.PigJobControl  - Checking state of job job name:	
    job id:	job_pigexec_0
    job state:	WAITING
    job mapred id:	null
    job message:	just initialized
    job has no depending job:	
    
    7277 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil  - Total input paths to process : 1
    7291 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil  - Total input paths (combined) to process : 1
    7698 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - HadoopJobId: job_1434886974841_0006
    7698 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Processing aliases A,B
    7698 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - detailed locations: M: A[1,4],B[1,57] C:  R: 
    7704 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 0% complete
    7705 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Running jobs are [job_1434886974841_0006]
    12698 [JobControl] DEBUG org.apache.pig.backend.hadoop23.PigJobControl  - Checking state of job job name:	N/A
    job id:	job_pigexec_0
    job state:	RUNNING
    job mapred id:	job_1434886974841_0006
    job message:	just initialized
    job has no depending job:	
    
    17597 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - 100% complete
    17602 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats  - Script Statistics: 
    
    HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
    2.6.0	0.14.0-SNAPSHOT	bordi	2015-06-21 14:12:11	2015-06-21 14:12:22	UNKNOWN
    
    Success!
    
    Job Stats (time in seconds):
    JobId	Maps	Reduces	MaxMapTime	MinMapTime	AvgMapTime	MedianMapTime	MaxReduceTime	MinReduceTime	AvgReduceTime	MedianReducetime	Alias	Feature	Outputs
    job_1434886974841_0006	1	0	2	2	2	2	0	0	0	0	A,B	MAP_ONLY	/tmp/idout,
    
    Input(s):
    Successfully read 3 records (466 bytes) from: "/tmp/consultant.txt"
    
    Output(s):
    Successfully stored 3 records (6 bytes) in: "/tmp/idout"
    
    Counters:
    Total records written : 3
    Total bytes written : 6
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0
    
    Job DAG:
    job_1434886974841_0006
    
    
    17632 [main] DEBUG org.apache.pig.backend.hadoop.executionengine.Launcher  - Error message from task (map) task_1434886974841_0006_m_000000
    17656 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Success!
    17658 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Original macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    17658 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - macro AST after import:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    17658 [main] DEBUG org.apache.pig.parser.QueryParserDriver  - Resulting macro AST:
    (QUERY (STATEMENT A (load '/tmp/consultant.txt' (FUNC PigStorage ','))) (STATEMENT B (foreach A (FOREACH_PLAN_SIMPLE (generate $0 (FIELD_DEF id))))))
    
    17667 [main] DEBUG org.apache.pig.builtin.JsonMetadata  - Could not find schema file for /tmp/consultant.txt
    wait end job complete
    job=COMPLETED
    job=true
    traitement terminé
    fin
    47701 [Thread-0] DEBUG org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  - Receive kill signal
    L'interrogation que cela suscite , c'est le fait qu'il a besoin de connaitre les fichiers de configurations du cluster au niveau local pour la retransmettre
    au cluster pour executer le job dans son conteneur, y compris les classpath des jar hadoop. c'est assez bizarre le truc.

    il me reste plus que hbase et cela sera fini pour ce sujet.
    Affaire à suivre

+ Répondre à la discussion
Cette discussion est résolue.
Page 1 sur 2 12 DernièreDernière

Discussions similaires

  1. Réponses: 2
    Dernier message: 04/10/2014, 02h49
  2. [Mission/Télétravail] Développeur Java/Map Reduce/Javascript/Python
    Par Alan44 dans le forum Demandes
    Réponses: 0
    Dernier message: 11/02/2014, 16h33
  3. Hadoop Map/Reduce Job and NullPointerException
    Par dclink dans le forum Collection et Stream
    Réponses: 1
    Dernier message: 04/02/2014, 12h13

Partager

Partager
  • Envoyer la discussion sur Viadeo
  • Envoyer la discussion sur Twitter
  • Envoyer la discussion sur Google
  • Envoyer la discussion sur Facebook
  • Envoyer la discussion sur Digg
  • Envoyer la discussion sur Delicious
  • Envoyer la discussion sur MySpace
  • Envoyer la discussion sur Yahoo