Web Crawler In Perl 網路爬蟲

Web Crawler In Perl 網路爬蟲

因為學校功課的關係,在很短的時間內摸perl的網路爬蟲,用api寫完後才發現老師要我們自己寫parser(乾 早知道不翹課),所以在用RE寫第二遍( :cry: )之後一陣子應該都用不到了,所以紀錄下來給未來可能用到的自己或是其他人看

Readme

本程式會去爬台南市的三個影城的電影時刻表
並且把資料抓下來解析,因為作業要求所以略過一些資訊(是否3dMax之類的),只抓電影名稱跟時間,很簡單的小程式
抓html -> parse 出我要的資訊 -> 輸出

Sample code Using Regular Expression

use LWP::Simple; use LWP::UserAgent; use HTTP::Request; use HTTP::Response; use HTML::LinkExtor; use Encode; $browser = LWP::UserAgent->new(); $browser->timeout(10); &crawler('http://www.atmovies.com.tw/showtime/t06607/a06/'); &crawler('http://www.atmovies.com.tw/showtime/t06608/a06/'); &crawler('http://www.atmovies.com.tw/showtime/t06609/a06/'); sub crawler{ (my $URL) =@_; my $request = HTTP::Request->new(GET => $URL); my $response = $browser->request($request); if ($response->is_error()) {printf "%s\n", $response->status_line;} $contents = $response->content(); $data = $contents; while($data =~ m!<ul id="theaterShowtimeTable">(.*?)<ul>(.*?)</ul>(.*?)</ul>!gs) { $item=$1; $otherparse=$3; $item =~ m!<a href=(.*)>(.*?)</a>!; $title=$2; print "$title\n"; while($otherparse =~m!(\d)(\d):(\d)(\d)!gs) { print "$1$2:$3$4\n"; } } }

sample code using TreeBuilder

use HTML::TreeBuilder; binmode(STDIN, ':encoding(utf8)'); binmode(STDOUT, ':encoding(utf8)'); binmode(STDERR, ':encoding(utf8)'); $URL = 'http://www.atmovies.com.tw/showtime/t06607/a06/'; my $tree = HTML::TreeBuilder->new_from_url($URL); my @items = $tree->look_down('id', 'theaterShowtimeTable' )or die("no items: $!\n"); for my $item (@items) { my @movies = $item->look_down( '_tag', 'li' ) or die("no movies$!\n"); $count=0; for my $movie (@movies) { if($count!=1&&$count!=2&&$count!=3) { if($movie->attr('class') ne "theaterElse" && $movie->attr('class') ne "filmVersion") { print $movie->as_text, "\n"; } } $count++; } }
  •  
  •  
張書維 張書維 Author

總網頁瀏覽量

Popular Posts