IdentifiantMot de passe
Loading...
Mot de passe oublié ?Je m'inscris ! (gratuit)
Navigation

Inscrivez-vous gratuitement
pour pouvoir participer, suivre les réponses en temps réel, voter pour les messages, poser vos propres questions et recevoir la newsletter

Langage Perl Discussion :

probleme avec script perl grab_tv_fr.pl


Sujet :

Langage Perl

Vue hybride

Message précédent Message précédent   Message suivant Message suivant
  1. #1
    Membre averti
    Profil pro
    Inscrit en
    Avril 2005
    Messages
    50
    Détails du profil
    Informations personnelles :
    Localisation : France

    Informations forums :
    Inscription : Avril 2005
    Messages : 50
    Par défaut probleme avec script perl grab_tv_fr.pl
    Bonjour

    j'ai un petit probleme avec un grabber fait en perl quand j'essaye de le lancer il me met cette erreur :

    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    Can't call method "as_text" on an undefined value at ./tv_grab_fr.pl line 528.
    quelqu"un a une idée pour esoudre cette erreur ? un module pas installé peu etre ?

    merci d'avance

  2. #2
    tfe
    tfe est déconnecté
    Membre éprouvé
    Profil pro
    Inscrit en
    Novembre 2005
    Messages
    85
    Détails du profil
    Informations personnelles :
    Localisation : France

    Informations forums :
    Inscription : Novembre 2005
    Messages : 85
    Par défaut
    sans la source c'est dur

    a prioris tu as une variable objet non intialisée

  3. #3
    Membre averti
    Profil pro
    Inscrit en
    Avril 2005
    Messages
    50
    Détails du profil
    Informations personnelles :
    Localisation : France

    Informations forums :
    Inscription : Avril 2005
    Messages : 50
    Par défaut
    ba enfait avant ca marché et j'ai rien changer au code ( ce n'est pas moi qui l'ai fait ) donc j'oré pliutot penché du coté d'un module pas installer.

    voici la source :

    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    325
    326
    327
    328
    329
    330
    331
    332
    333
    334
    335
    336
    337
    338
    339
    340
    341
    342
    343
    344
    345
    346
    347
    348
    349
    350
    351
    352
    353
    354
    355
    356
    357
    358
    359
    360
    361
    362
    363
    364
    365
    366
    367
    368
    369
    370
    371
    372
    373
    374
    375
    376
    377
    378
    379
    380
    381
    382
    383
    384
    385
    386
    387
    388
    389
    390
    391
    392
    393
    394
    395
    396
    397
    398
    399
    400
    401
    402
    403
    404
    405
    406
    407
    408
    409
    410
    411
    412
    413
    414
    415
    416
    417
    418
    419
    420
    421
    422
    423
    424
    425
    426
    427
    428
    429
    430
    431
    432
    433
    434
    435
    436
    437
    438
    439
    440
    441
    442
    443
    444
    445
    446
    447
    448
    449
    450
    451
    452
    453
    454
    455
    456
    457
    458
    459
    460
    461
    462
    463
    464
    465
    466
    467
    468
    469
    470
    471
    472
    473
    474
    475
    476
    477
    478
    479
    480
    481
    482
    483
    484
    485
    486
    487
    488
    489
    490
    491
    492
    493
    494
    495
    496
    497
    498
    499
    500
    501
    502
    503
    504
    505
    506
    507
    508
    509
    510
    511
    512
    513
    514
    515
    516
    517
    518
    519
    520
    521
    522
    523
    524
    525
    526
    527
    528
    529
    530
    531
    532
    533
    534
    535
    536
    537
    538
    539
    540
    541
    542
    543
    544
    545
    546
    547
    548
    549
    550
    551
    552
    553
    554
    555
    556
    557
    558
    559
    560
    561
    562
    563
    564
    565
    566
    567
    568
    569
    570
    571
    572
    573
    574
    575
    576
    577
    578
    579
    580
    581
    582
    583
    584
    585
    586
    587
    588
    589
    590
    591
    592
    593
    594
    595
    596
    597
    598
    599
    600
    601
    602
    603
    604
    605
    606
    607
    608
    609
    610
    611
    612
    613
    614
    615
    616
    617
    618
    619
    620
    621
    622
    623
    624
    625
    626
    627
    628
    629
    630
    631
    632
    633
    634
    635
    636
    637
    638
    639
    640
    641
    642
    643
    644
    645
    646
    647
    648
    649
    650
    651
    652
    653
    654
    655
    656
    657
    658
    659
    660
    661
    662
    663
    664
    665
    666
    667
    668
    669
    670
    671
    672
    673
    674
    675
    676
    677
    678
    679
    680
    681
    682
    683
    684
    685
    686
    687
    688
    689
    690
    691
    692
    693
    694
    695
    696
    697
    698
    699
    700
    701
    702
    703
    704
    705
    706
    707
    708
    709
    710
    711
    712
    713
    714
    715
    716
    717
    718
    719
    720
    721
    722
    723
    724
    725
    726
    727
    728
    729
    730
    731
    732
    733
    734
    735
    736
    737
    738
    739
    740
    741
    742
    743
    744
    745
    746
    747
    748
    749
    750
    751
    752
    753
    754
    755
    756
    757
    758
    759
    760
    761
    762
    763
    764
    765
    766
    767
    768
    769
    770
    771
    772
    773
    774
    775
    776
    777
    778
    779
    780
    781
    782
    783
    784
    785
    786
    787
    788
    789
    790
    791
    792
    793
    794
    795
    796
    #!/usr/bin/perl -w
     
    eval 'exec /usr/bin/perl -w -S $0 ${1+"$@"}'
        if 0; # not running under some shell
     
    =head1 NAME
     
    tv_grab_fr - Grab TV listings for France.
     
    =head1 SYNOPSIS
     
    To configure: tv_grab_fr --configure [--config-file FILE] [--gui OPTION]
    To grab listings: tv_grab_fr [--output FILE] [--quiet]
    Slower, detailed grab: tv_grab_fr --slow [--output FILE] [--days N] [--offset N] [--quiet]
    Help: tv_grab_fr --help
     
    =head1 DESCRIPTION
     
    Output TV listings for several channels available in France (Hertzian,
    Cable/satellite, Canal+ Sat, TPS).  The data comes from
    telepoche.guidetele.com.  The default is to grab as many days as possible
    from the current day onwards. By default the program description are
    not downloaded, so if you want description and ratings, you should
    active the --slow option.
     
    B<--configure> Grab channels informations from the website and ask for
    channel type and names.
     
    B<--gui OPTION> Use this option to enable a graphical interface to be used.
    OPTION may be 'Tk', or left blank for the best available choice.
    Additional allowed values of OPTION are 'Term' for normal terminal output
    (default) and 'TermNoProgressBar' to disable the use of Term::ProgressBar.
     
    B<--output FILE> write to FILE rather than standard output.
     
    B<--days N> grab N days starting from today, rather than as many as
    possible. Due to the website organization, the speed is exactly the
    same, whatever the number of days is until you activate the --slow
    option.  So this option is ignored if --slow is not also given.
     
    B<--offset N> start grabbing N days from today, rather than starting
    today.  N may be negative. Due to the website organization, N cannot
    be inferior to -1.  As with --days, this is only useful for limiting
    downloads in --slow mode.
     
    B<--slow> Get additional information from the website, like program
    description, reviews and credits.
     
    B<--quiet> suppress the progress messages normally written to standard
    error.
     
    B<--help> print a help message and exit.
     
    =head1 SEE ALSO
     
    L<xmltv(5)>
     
    =head1 AUTHOR
     
    Sylvain Fabre, centraladmin@lahiette.com
    with some patches from :
      - Francois Gouget, fgouget@free.fr
      - Niel Markwick, nielm@bigfoot.com
     
    =cut
     
    # Todo: perhaps we should internationalize messages and docs?
    use XMLTV::Usage <<END
    $0: get French television listings in XMLTV format
    To configure: tv_grab_fr --configure [--config-file FILE]
    To grab listings: tv_grab_fr [--output FILE] [--quiet]
    Slower, detailed grab: tv_grab_fr --slow [--output FILE] [--days N] [--offset N] [--quiet]
    END
      ;
     
    use warnings;
    use strict;
    use XMLTV::Version '$Id: tv_grab_fr,v 1.20 2005/04/13 19:37:37 reudeudeu Exp $ ';
    use Getopt::Long;
    use HTML::TreeBuilder;
    use HTML::Entities; # parse entities
    use IO::File;
    use URI;
    use Date::Manip;
    use XMLTV;
    use XMLTV::Memoize;
    use XMLTV::Ask;
    use XMLTV::ProgressBar;
    use XMLTV::Mode;
    use XMLTV::Config_file;
    use XMLTV::DST;
    use XMLTV::Get_nice;
     
    # Force the infamous delay value to 0
    $XMLTV::Get_nice::Delay = 1;
     
    #***************************************************************************
    # Main declarations
    #***************************************************************************
    my $GRID_BASE_URL = 'http://telepoche.guidetele.com/gtv/grille?openagent&d=2&h=6&b=';
    my $GRID_BY_CHANNEL = 'http://telepoche.guidetele.com/gtv/semaine?openagent&c=';
    my $SHEET_URL = "http://telepoche.guidetele.com/fiche/emi_";
    my $ROOT_URL  = "http://telepoche.guidetele.com";
    my $LANG = "fr";
    my $MAX_STARS = 4;
    my $MAX_RETRY = 5;
    my $VERSION   = "130405-01";
     
    # Temporary avoid XML warnings (to be investigated)
    no warnings;
     
    # Grid id defined by the website according to channel types (needed to build the URL)
    my %GridType = (  "HERTZIENNE" => "EMWD-66DGBM",
                      "TNT"        => "EMWD-6B2HZ3",
                      "CABLE/SAT"  => "EMWD-66DGCT",
                      "TPS"        => "EMWD-66DJQG",
                      "CANAL SAT"  => "EMWD-66DJEA",
                      "FREEBOX"    => "EMWD-66DJXL",
                      "ETRANGERES" => "EMWD-66DJAL" );
     
    # Slot of hours according to the website (needed to build the URL)
    my @offsets = (2, 3, 4, 5, 6, 7);
     
    #***************************************************************************
    # Global variables allocation according to options
    #***************************************************************************
    XMLTV::Memoize::check_argv('get_page_aux');
    my ($opt_days,  $opt_help,  $opt_output,  $opt_offset,  $opt_gui, $opt_quiet,  $opt_list_channels, $opt_config_file, $opt_configure, $opt_slow, $opt_licons);
    $opt_quiet  = 0;
    # The website is able to store up to nine days from now
    my $default_opt_days = 9;
    $opt_output = '-'; # standard output
    GetOptions('days=i'    => \$opt_days,
         'help'      => \$opt_help,
         'output=s'  => \$opt_output,
         'offset=i'  => \$opt_offset,
         'quiet'     => \$opt_quiet,
         'configure' => \$opt_configure,
         'config-file=s' => \$opt_config_file,
         'gui:s'     => \$opt_gui,
         'list-channels' => \$opt_list_channels,
         'slow' => \$opt_slow
        )
      or usage(0);
     
    #***************************************************************************
    # Options processing, warnings, checks and default parameters
    #***************************************************************************
    die 'Number of days must not be negative'  if (defined $opt_days && $opt_days < 0);
    die 'Cannot get more than one day before current day' if (defined $opt_offset && $opt_offset < -1);
    usage(1) if $opt_help;
     
    XMLTV::Ask::init($opt_gui);
     
    if (not $opt_slow) {
        # Certain options are ignored in fast mode.
        my %slow_options = (days => $opt_days,
                            offset => $opt_offset,
                           );
        foreach (sort keys %slow_options) {
            if (defined $slow_options{$_}) {
                say <<END
    In normal, fast grabbing mode all days are fetched at once, so the
    --$_ option does nothing.  The option is useful only for reducing
    the extra downloads caused by --slow mode.
    END
                  ;
            }
        }
    $opt_days = $default_opt_days;
    $opt_offset = 0;
    }
    else {
        # The options can be used, but we default them if not set.
        $opt_offset = 0 if not defined $opt_offset;
        $opt_days = $default_opt_days if not defined $opt_days;
    }
     
    if ( (($opt_offset + $opt_days) > $default_opt_days) or ($opt_offset > $default_opt_days) ) {
        $opt_days = $default_opt_days - $opt_offset;
        if ($opt_days < 0) {
            $opt_offset = 0;
            $opt_days = $default_opt_days;
        }
        say <<END
    The website does not handle more than $default_opt_days days.
    So the grabber is now configure with --offset $opt_offset --days $opt_days
    END
    ;
    }
     
    #***************************************************************************
    # Last init before doing real work
    #***************************************************************************
    my %results;
    my $lastdaysoffset = $opt_offset + $opt_days - 1;
     
    # Now detects if we are in configure mode
    my $mode = XMLTV::Mode::mode('grab', # default
                            $opt_configure => 'configure',
                            $opt_list_channels => 'list-channels');
     
    # File that stores which channels to download.
    my $config_file = XMLTV::Config_file::filename($opt_config_file, 'tv_grab_fr', $opt_quiet);
     
     
    #***************************************************************************
    # Sub sections
    #***************************************************************************
    sub get_channels( $ );
    sub process_channel_grid_page( $$$$ );
    sub debug_print( $ );
     
    # Set this to 1 of you debug strings
    my $DEBUG_FR = 0;
    # Internal debug functions
    sub debug_print( $ ) {
      my $str = shift;
     
      if ($DEBUG_FR) { print $str; }
    }
     
    # Get a page using this agent.
    sub get_page( $ ) {
        my $url = shift;
        # For Memoize s sake make extra sure of scalar context
        #return scalar get_page_aux($url);
        #return get_nice($url);
        return scalar get_page_aux($url);
    }
     
    # Curious function to deal with the Get_nice API which does not offer an internal retry mode.
    # Awful, but it seems to work.
    # It works well, and it is mandatory with the telepoche website... Sorry for the ugly code...
    sub get_page_aux {
        my $url = shift;
        my $retry = $MAX_RETRY;
        my $got;
        my $sleep = 0;
     
    GET:
        # Sleep 1 second after 1 pass
        sleep $sleep;
        $sleep = 1;
        # Call the get_nice API
        eval { $got = get_nice($url) };
        # Then check the return string of the get_nice function
        goto GET if $@ and $@ =~ /could not fetch/ and --$retry;
     
        die "Can\'t download $url !!! Check you internet connection." if $retry == 0;
        return $got;
    }
     
    sub xmlencoding {
        # encode for xml
        $_[0] =~ s/</&lt;/g;
        $_[0] =~ s/>/&gt;/g;
        $_[0] =~ s/&/\%26/g;
        return $_[0];
    }
     
    my $warned_bad_chars;
    my $warned_unicode_chars;
    sub tidy {
        # clean bad characters from HTML
        for (my $s = shift) {
          tr/\205//d;
          tr/\222/''/;
          s/\234/oe/g;
          s/−/ /g;
     
          # Remove nasty caracters, thanks to nielm
          s/&ldquo;|&rdquo;|&\#8219;|&\#8220;|&\#x201[89];/&quot/g;
          s/&lsquo;|&rsquo;|&\#8217;|&\#8218;|&\#x201[cdCD];/\'/g;
          s/&\#8230;|&\#x202[4-7];/.../g;
          s/&\#821[0123];|&\#x201[2-5];/-/g;
          s/&OElig;/OE/g;
          s/&oelig;/oe/g;
          s/œ/oe/g;
     
          if ( s/(&\#[0-9]{4,};)//g ) {
            print STDERR "removing unknown UNICODE characters: '$1'\n" unless $warned_unicode_chars++;
          }
          if ( s/(&\#x[0-9a-zA-Z]{3,};)//g ) {
            print STDERR "removing unknown UNICODE characters: '$1'\n" unless $warned_unicode_chars++;
          }
          # Not strictly a bad character but it does get in the way.
          s/&nbsp;/ /g;
          tr/\240/ /;
          tr/\t/ /;
     
          if (s/([^\012\015\040-\176\240-\377])//g) {
            print STDERR "removing bad characters: '$1'" unless $warned_bad_chars++;
          }
          return $_;
        }
    }
     
    #***************************************************************************
    # Configure mode
    #***************************************************************************
    if ($mode eq 'configure') {
        XMLTV::Config_file::check_no_overwrite($config_file);
        open(CONF, ">$config_file") or die "Cannot write to $config_file: $!";
     
        # Get a list of available channels, according to the grid type
        my @gts = sort keys %GridType;
        my @gtnames = map { $GridType{$_} } @gts;
        my @gtqs = map { "Get channels type : $_?" } @gts;
        my @gtwant = ask_many_boolean(1, @gtqs);
     
        my $bar = new XMLTV::ProgressBar('getting channel lists',
                                        scalar grep { $_ } @gtwant)
                        if not $opt_quiet;
        my %channels_for;
        foreach my $i (0 .. $#gts) {
            my ($gt, $gtw, $gtname) = ($gts[$i], $gtwant[$i], $gtnames[$i]);
            next if not $gtw;
            my %channels = get_channels( $gtname );
            die 'No channels could be found' if not %channels;
            $channels_for{$gt} = \%channels;
            update $bar if not $opt_quiet;
        }
        $bar->finish() if not $opt_quiet;
     
        my %asked;
        foreach (@gts) {
            my $gtw = shift @gtwant;
            my $gtname = shift @gtnames;
            if ($gtw) {
                my %channels = %{$channels_for{$_}};
                say "Channels for $_";
     
                # Ask about each channel (unless already asked).
                my @chs = grep { not $asked{$_}++ } sort keys %channels;
                my @names = map { $channels{$_}{name} } @chs;
                my @qs = map { "add channel $_?" } @names;
                my @want = ask_many_boolean(1, @qs);
                foreach (@chs) {
                    my $w = shift @want;
                    warn("cannot read input, stopping channel questions"), last if not defined $w;
                    # Print a config line, but comment it out if channel not wanted.
                    print CONF '#' if not $w;
                    print CONF "channel $_ $channels{$_}{name};$channels{$_}{icon}\n";
                }
            }
        }
        close CONF or warn "cannot close $config_file: $!";
        say("Finished configuration.");
        exit();
    }
     
    #***************************************************************************
    # Check mode checking and get configuration file
    #***************************************************************************
    die if $mode ne 'grab' and $mode ne 'list-channels';
     
    my @config_lines;
    if ($mode eq 'grab') {
        @config_lines = XMLTV::Config_file::read_lines($config_file);
    }
     
    #***************************************************************************
    # Prepare the XMLTV writer object
    #***************************************************************************
    my %w_args;
    if (defined $opt_output) {
        my $fh = new IO::File(">$opt_output");
        die "cannot write to $opt_output: $!" if not defined $fh;
        $w_args{OUTPUT} = $fh;
    }
     
    $w_args{encoding} = 'ISO-8859-1';
    my $writer = new XMLTV::Writer(%w_args);
    $writer->start
      ({ 'source-info-url'     => 'http://telepoche.guidetele.com/',
         'source-data-url'     => 'http://telepoche.guidetele.com/',
         'generator-info-name' => 'XMLTV',
         'generator-info-url'  => 'http://membled.com/work/apps/xmltv/',
       });
     
    #***************************************************************************
    # List channels only case
    #***************************************************************************
    if ($opt_list_channels) {
        # Get a list of available channels, according to the grid type
        my @gts = sort keys %GridType;
        my @gtnames = map { $GridType{$_} } @gts;
        my @gtqs = map { "List channels for grid : $_?" } @gts;
        my @gtwant = ask_many_boolean(1, @gtqs);
     
        foreach (@gts) {
            my $gtw = shift @gtwant;
            my $gtname = shift @gtnames;
            if ($gtw) {
                say  "Now getting grid : $_ \n";
                my %channels = get_channels( $gtname );
                die 'no channels could be found' if (scalar(keys(%channels)) == 0);
                foreach my $ch_did (sort(keys %channels)) {
                    my $ch_xid = "C".$ch_did."telepoche.com";
                    $writer->write_channel({ id => $ch_xid,
                                             'display-name' => [ [ $channels{$ch_did}{name} ] ],
                                             'icon' => [{src=>$ROOT_URL.$channels{$ch_did}{icon}}] });
                }
           }
         }
         $writer->end();
         exit();
    }
     
    #***************************************************************************
    # Now the real grabbing work
    #***************************************************************************
    die if $mode ne 'grab';
     
    #***************************************************************************
    # Build the working list of channel name/channel id
    #***************************************************************************
    my (%channels, $chicon, $chid, $chname);
    my $line_num = 1;
    foreach (@config_lines) {
        ++ $line_num;
        next if not defined;
     
        # Here we store the Channel name with the ID in the config file, as the XMLTV id = Website ID
        if (/^channel:?\s+(\S+)\s+([^\#]+);([^\#]+)/) {
            $chid = $1;
            $chname = $2;
            $chicon = $3;
            $chname =~ s/\s*$//;
            $channels{$chid} = {'name'=>$chname, 'icon'=>$chicon};
        } else {
            warn "$config_file:$line_num: bad line $_\n";
        }
    }
     
    #***************************************************************************
    # Now process the days by getting the main grids.
    #***************************************************************************
    my @to_get;
    warn "No working channels configured, so no listings\n" if not %channels;
    my $script_duration = time();
     
    # The website stores channel information by hour area for a whole week !
    foreach $chid (sort keys %channels) {
        $writer->write_channel({ id => "C".$chid.".telepoche.com", 'display-name' => [[$channels{$chid}{name}]], 'icon' => [{src=>$ROOT_URL.$channels{$chid}{icon}}]});
        foreach (@offsets) {
            my $url = $GRID_BY_CHANNEL . "$chid&h=$_";
            push @to_get, [ $url, $chid, $_ ];
        }
    }
    my $bar = new XMLTV::ProgressBar('getting listings', scalar @to_get)  if not $opt_quiet;
    Date_Init("TZ=UTC");
     
    foreach (@to_get) {
        my ($url, $chid, $slot) = @$_;
        #my $th = threads->new(\&process_channel_grid_page, $writer, $chid, $url, $slot);
        #$th->join();
        process_channel_grid_page($writer, $chid, $url, $slot);
        update $bar if not $opt_quiet;
    }
    $writer->end();
    $bar->finish() if not $opt_quiet;
     
    # Print the duration
    $script_duration = time() - $script_duration;
    print STDERR "Grabber process finished in " . $script_duration . " seconds.\n" if not $opt_quiet;
     
    #***************************************************************************
    # Specific functions for grabbing information
    #***************************************************************************
    sub get_channels( $ ) {
        my $gridid = shift;
        my %channels;
        my $url = $GRID_BASE_URL.$gridid;
     
        my $t = HTML::TreeBuilder->new;
        $t->parse(tidy(get_page($url)));
        $t->eof;
        debug_print( "URL  : " . $url ."\n");
        foreach my $cellTree ( $t->look_down( "_tag", "td", "width", "50", "height", "62" ) ) {
          my $chid = $cellTree->look_down( "_tag", "a", 'href', '#' )->attr('onclick');
          if ( $chid =~ /goChaine\('(.*)','(.*)',''\);/ ) {
            $chid = $1;
            my $imgCell = $cellTree->look_down( "_tag", "img" );
            my $chname = $imgCell->attr('src');
            $chname =~ s/\/c_img\/chaine\///;
            $chname =~ s/\.gif//;
            debug_print "Found channel : $chid - " . $chname . "\n";
            $channels{$chid} = {'name' =>  $chname, 'icon' => $imgCell->attr('src') };
          }
        }
        $t->delete(); undef $t;
        return %channels;
      }
     
    sub process_channel_grid_page( $$$$ ) {
        my ($writer, $chid, $url, $slot) = @_;
        my ($genre, $showview, $hours, $starthour, $endhour, $date, $dateindex) = 0;
        my ($title, $subgenre, $footext, $star_rating, $datecreate) = 0;
     
        # Get the current page
        my $t = HTML::TreeBuilder->new;
        debug_print("Now getting page : " . $url . "\n");
        $t->parse(tidy(get_page($url)));
        $t->eof;
     
        # Reset some working variables
        my $cont = 0;
        my $nbloop = 0;
     
        # The process the page ...
        # Each day is encapsulated in a table with the following parameters :
        foreach my $tableTree ($t->look_down('_tag', 'table', 'width', '532', 'bgcolor', '#ffffff') ) {
          # The new website display 2 times this table before real content table, so skip to tables occurence...
          $cont += 1;
          next if ($cont < 3);
          # Get the list of rows of the table
          my @dateRowTab = $tableTree->content_list();
          # Now loop thru rows
          foreach my $dateRow (@dateRowTab)  {
            # First row is the date
            my $dateTree = $dateRow->look_down('_tag', 'td', 'width', '50');
            my ($day, $month) = split (/\//,$dateTree->as_text); #PROBLEME ICI 
            $date = ParseDate("$month/$day/".UnixDate(DateCalc("today","+$nbloop days"),"%Y"));
            $dateindex = UnixDate($date, "%Y%m%d");
     
            # We need to limit the number of days fetched in slow
            # mode, but in fast mode no limit is needed since
            # there is a single fetch for all days.
            if ($opt_slow) {
              next if Date_Cmp($dateindex, UnixDate(DateCalc("today", "+$opt_offset days"),"%Y%m%d")) < 0;
              next if Date_Cmp($dateindex, UnixDate(DateCalc("today", "+$lastdaysoffset days"),"%Y%m%d")) > 0;
            }
            $nbloop += 1;
            # Then the program information
            my $tabDay = $dateRow->look_down('_tag', 'td', 'width', '480', 'height', '62' );
            foreach my $progTree ($tabDay->look_down('_tag', 'a', 'onMouseout', 'hidemenu()') ) {
              my $text = $progTree->as_text();
              my $line = $progTree->attr('onMouseover');
              $line =~ (!m/drc\(([^""]+)\)/);
              $line =~ m/\'(.*)\',\'(.*)\'/;
              $title = $2;
              my $mydata = $1;
              next if ( $title eq 'Fin des programmes');
              ($hours, $genre, $showview) = split (/<br>/, $mydata);
              next if ( !$hours );
              # Process the title, sometimes a showview field is shown
              $title =~ s/^\d{7,8} //;
              $title =~ s/\\//g;
              if ($title =~ s/\s*([*]+)\s*$//) {
              my $n = length $1;
              if (0 < $n and $n <= $MAX_STARS) {
                $star_rating = $n;
              } elsif ($MAX_STARS < $n) {
                warn "too many stars ($n), expected at most $MAX_STARS\n";
              } else { die }
            }
            die if $title =~ /[*]$/;
            my ($language, $subtitles_language);
            for ($title) {
              s/\s+$//;
              if (s/\s+\(VO\)$//) {
                # Version originale - language is unknown but not
                # French.  There is no way to represent this in
                # the DTD.
                #
              }
              elsif (s/\s+\(VO sous-titr.e\)$//) {
                # Language unknown, but we know it has French
                # subtitles.
                #
                $subtitles_language = 'fr';
              }
              elsif (s/\s+\(VF\)$//) {
                # Version francaise.  The title may or may not be
                # translated.
                #
                $language = 'fr';
              }
            }
            # At this point, $title contains title and subtitle (if any),
            # separated by a '-'. We will try to split off the subtitle
            # further down
     
            # Process hours, there are like HHhMM
            ($starthour, $endhour)  = split("-", $hours);
            $starthour =~ s/h//g
              or die "Cannot detect start hour from website : $starthour \n";
            $endhour   =~ s/h//g
              or die "Cannot detect end hour from website : $endhour \n";
            # Process the start/stop dates
            my $start = $dateindex.$starthour."00";
            my $stop  = $dateindex.$endhour."00";
            # Dummy site : the slot 0-4 of day n is in fact the slot 0-4 for day n+1
            if ( $slot == 7 ) {
              my $myslot = substr($starthour, 0, 2);
              die if not $start;
              $start = &UnixDate(&DateCalc($start, "+1 day"), "%Y%m%d%H%M%S")
                if ($myslot >= 0 && $myslot < 4);
              die 'could not add one day to start time' if not $start;
              $stop  = &UnixDate(&DateCalc($stop, "+1 day"), "%Y%m%d%H%M%S");
              die 'could not add one day to stop time' if not $stop;
            }
            # Last check to see if start > stop
            if ( Date_Cmp($start, $stop) > 0 ) {
              $stop = &UnixDate(&DateCalc($stop, "+1 day"), "%Y%m%d%H%M%S");
              die 'could not add one day to stop time' if not $stop;
            }
            # Now set the proper timezone (WT/ST) according to current date
            die if not $start; die if not $stop;
            $start = utc_offset( $start, "+0100");
            $stop  = utc_offset( $stop , "+0100");
            # Now use the utf8 conversion (???)
            utf8::encode($title)  if (utf8::is_utf8($title) );
            my %prog = (channel  => "C".$chid.".telepoche.com",
                  title    => [ [ $title ] ],             # lang unknown
                  start    => $start,
                  stop     => $stop
                );
            debug_print("Found title : $title - $start - $stop \n");
            $prog{'star-rating'} = [ "$star_rating/$MAX_STARS" ]
              if defined $star_rating;
            for ($language) { $prog{language} = [ $_ ] if defined }
            for ($subtitles_language) {
              $prog{subtitles} = [ { type => 'onscreen',
                  language => [ $_ ] } ]
                if defined;
            }
            # Sometimes the genre is not set, so replace it by the showview field
            if (defined $genre and $genre =~ m/Showview : /) {
              $showview = $genre;
              undef $genre;
            }
            # Process the genre, subgenre and date if defined
            if  (defined $genre ) {
              ($genre, $datecreate) = split("-", $genre);
              ($genre, $subgenre)   = split(",", $genre);
              for ($genre) { s/^\s+//; s/\s+$// }
              if (defined $subgenre) {
                for ($subgenre) { s/^\s+//; s/\s+$// }
                # utf8 conversion...
                utf8::encode($genre) if (utf8::is_utf8($genre));
                utf8::encode($subgenre) if (utf8::is_utf8($subgenre));
                $prog{category} = [ [ xmlencoding($genre), $LANG ], [ xmlencoding($subgenre), $LANG ] ];
              } else {
                $prog{category} = [ [ xmlencoding($genre), $LANG ] ];
              }
              if (defined $datecreate) {
                for ($datecreate) { s/^\s+//; s/\s+$// }
                $prog{date} = $datecreate ;
              }
            }
            # Process the showview field
            if ( defined $showview ) {
              $showview =~ s/Showview : //;
              for ($showview) { s/^\s+//; s/\s+$// }
              $prog{showview} = $showview;
            }
     
            # Variables needed for the detailed information parsing
            my ($idesc, $tdesc, $imgdesc);
            # Now get program description if the longlisting option is set
            if ( $opt_slow && $progTree->attr('class') eq 'fiche' ) {
              my $id = $progTree->attr('onclick');
              my @desc;
              $id =~ /fiche\('(\d+)'\)/
                or die "expected fiche(x), got: $id";
              $id = $1;
              debug_print("Calling sheet URL : " . $SHEET_URL . $id . "\n");
              my $tfic = HTML::TreeBuilder->new;
              $tfic->parse(tidy(get_page($SHEET_URL . $id)));
              $tfic->eof;
     
              # This page's title tag contains the program title without
              # the sub-title. Use it to separate the two.
              my $ttitle;
              if ( $ttitle = $tfic->look_down('_tag', 'title') ) {
                my $htmltitle=$ttitle->as_text();
                if ($title =~ s/^\Q$htmltitle\E\s+-\s+//) {
                  $prog{'title'} = [ [ $htmltitle ] ];
                  $prog{'sub-title'} = [ [ $title ] ];
                }
              }
              # Get the duration and the year
              my ($length, $hour, $min, $year);
              if ( $tdesc = $tfic->look_down('_tag', 'td', 'width', '250', 'class', 'txt') ) {
                $length = $tdesc->as_text();
                if ( $length =~ s/ Durée : (\d+)h(\d+) AM - (\d+)// ) {
                  $hour = $1; $min = $2; $year = $3;
                  # guidetele.com si full of bugs ...
                  $hour = $hour - 12 if ($hour >= 12);
                  $prog{'length'} = ($hour * 3600) + ($min * 60);
                  $prog{'date'}   = $year;
                }
              }
     
              # Now get descriptions, summary, advices, actors and director
              my ($resume, $histoire, $avis);
              my ($nextIsResume,$nextIsHistory,$nextIsAvis,$nextIsDirector) = (0,0,0,0);
              my (@director, @actor);
              if ( $tdesc = $tfic->look_down('_tag', 'td', 'width', '396') ) {
                # Detect actors
                foreach my $actorcell ($tdesc->look_down('_tag', 'td', 'class', 'disActeur') ) {
                  push @actor, tidy($actorcell->as_text());
                }
                my @children = $tdesc->content_list();
                foreach my $desc (@children)  {
              # Remove leading and trailing spaces
              $desc =~ s/^ ://;
              $desc =~ s/ $//;
              if ($nextIsDirector == 1 ) {
                push @director, tidy($desc);
                $nextIsDirector = 0;
                debug_print "FOUND DIRECTOR : " . tidy($desc) . " - $title - $id\n";
              }
              if ($nextIsResume == 1) {
                warn "RESUME seen twice\n" if defined $resume;
                $resume = tidy($desc);
                $nextIsResume = 0;
                debug_print "FOUND RESUME : $resume \n";
              }
              if ($nextIsHistory == 1 ) {
                warn "HISTOIRE seen twice\n" if defined $histoire;
                $histoire = tidy($desc);
                $nextIsHistory = 0;
                debug_print "FOUND HISTOIRE : $histoire \n";
              }
              if ($nextIsAvis == 1 ) {
                warn "AVIS seen twice\n" if defined $avis;
                $avis = tidy($desc);
                $nextIsAvis = 0;
                debug_print "FOUND AVIS : $avis \n";
              }
              if ( ref($desc) ) {
                $nextIsResume = 1 if ( $desc->as_text() eq "RESUME" );
                $nextIsHistory = 1 if  ( $desc->as_text() eq "HISTOIRE" );
                $nextIsAvis = 1 if  ( $desc->as_text() eq "AVIS" );
                $nextIsDirector = 1 if ($desc->as_text eq "Réalisateur" );
              }
            }
          }
     
          # RESUME is main definition, HISTOIRE shorter.
          foreach ($resume, $histoire) {
            push @{$prog{desc}}, [ $_, $LANG ] if defined and length;
          }
     
          # Add AVIS to the main description, or make a new desc
          # for it if there are none.
          if (defined $avis and length($avis) ) {
            if ($prog{desc}) {
              $prog{desc}->[0]->[0] .= "Critique : " . $avis;
            } else {
              push @{$prog{desc}}, [ $avis, $LANG ];
            }
          }
     
          if ($idesc = $tfic->look_down('_tag', 'table',  'width', '190', 'height', '100%') ) {
            if ($tdesc = $idesc->look_down('_tag', 'td', 'valign', 'top', 'align', 'center' ) ) {
              if ($imgdesc = $tdesc->look_down('_tag', 'img') ) {
                $prog{icon} = [ {'src' => $ROOT_URL.$imgdesc->attr('src') } ];
              }
            }
          }
          # Now push the credits section, if exists
          $prog{credits}{director } = \@director if @director;
          $prog{credits}{actor}     = \@actor if @actor;
        } else {
          # The text for the <a> tag contains the title without the
          # sub-title so we can use that to separate the two. However
          # the text for the <a> tag may have been truncated so it
          # fits the slot on the page. Also some titles may contain
          # a ' - '. Still the heuristic works very well.
          my $subtitle;
          if ($text =~ s/\.\.\.$//) {
            if ($title =~ s/^\Q$text\E([^-]+)\s+-\s+//) {
              $prog{'title'} = [ [ "$text$1" ] ];
              $prog{'sub-title'} = [ [ $title ] ];
            }
          }
          elsif ($title =~ s/^\Q$text\E\s+-\s+//) {
            $prog{'title'} = [ [ $text ] ];
            $prog{'sub-title'} = [ [ $title ] ];
          }
        }
     
        if ( !$results{$prog{start}.$chid} ) {
          $results{$prog{start}.$chid} = "1";
          $writer->write_programme(\%prog);
        }
      }
          }
        }
        $t->delete(); undef $t;
      }

  4. #4
    Invité
    Invité(e)
    Par défaut
    As-tu essayé de mettre la chaîne entre double quote ?

    Code : Sélectionner tout - Visualiser dans une fenêtre à part
    1
    2
     
    my ($day, $month) = split (/\//,"$dateTree->as_text");

  5. #5
    Membre averti
    Profil pro
    Inscrit en
    Juillet 2004
    Messages
    9
    Détails du profil
    Informations personnelles :
    Localisation : France

    Informations forums :
    Inscription : Juillet 2004
    Messages : 9
    Par défaut
    C'est du a un changement sur le site de telepoche

    Récupère plutot la nouvelle version de tv_grab_fr du 27/11/2005

    http://cvs.sourceforge.net/viewcvs.py/xmltv/xmltv/grab/fr/

  6. #6
    Membre averti
    Profil pro
    Inscrit en
    Avril 2005
    Messages
    50
    Détails du profil
    Informations personnelles :
    Localisation : France

    Informations forums :
    Inscription : Avril 2005
    Messages : 50
    Par défaut
    c'est bon ca fonctionne c'était du a la mise a jour

    merci a tous

  7. #7
    Membre expérimenté
    Avatar de GLDavid
    Homme Profil pro
    Head of Service Delivery
    Inscrit en
    Janvier 2003
    Messages
    2 892
    Détails du profil
    Informations personnelles :
    Sexe : Homme
    Âge : 48
    Localisation : France, Seine Saint Denis (Île de France)

    Informations professionnelles :
    Activité : Head of Service Delivery
    Secteur : Industrie Pharmaceutique

    Informations forums :
    Inscription : Janvier 2003
    Messages : 2 892
    Par défaut
    tag résolu

    Merci
    GLDavid
    Consultez la FAQ Perl ainsi que mes cours de Perl.
    N'oubliez pas les balises code :tagcode: ni le tag :resolu:

    Je ne répond à aucune question technique par MP.

Discussions similaires

  1. probleme avec while Perl
    Par imorum dans le forum Langage
    Réponses: 2
    Dernier message: 21/01/2007, 19h03
  2. Problème avec script bash
    Par jejerome dans le forum Shell et commandes GNU
    Réponses: 5
    Dernier message: 26/11/2006, 22h32
  3. [SQL] probleme avec script mail menu deroulant
    Par gtraxx dans le forum PHP & Base de données
    Réponses: 2
    Dernier message: 22/09/2006, 03h04
  4. probleme avec script
    Par tostos94 dans le forum Windows
    Réponses: 1
    Dernier message: 24/06/2006, 14h09
  5. Probleme avec <script src=
    Par MicroPuce dans le forum Général JavaScript
    Réponses: 6
    Dernier message: 16/05/2006, 15h12

Partager

Partager
  • Envoyer la discussion sur Viadeo
  • Envoyer la discussion sur Twitter
  • Envoyer la discussion sur Google
  • Envoyer la discussion sur Facebook
  • Envoyer la discussion sur Digg
  • Envoyer la discussion sur Delicious
  • Envoyer la discussion sur MySpace
  • Envoyer la discussion sur Yahoo