写了一个Perl脚本来自动删除某一个作者在新语丝上的所有文章


所有跟贴·加跟贴·新语丝读书论坛

送交者: jhuang 于 2008-11-30, 04:23:15:

回答: 手工检索删除:-) 由 方舟子 于 2008-11-30, 01:52:45:

想想寻正的文章挺多的,删起来费事。将下面的脚本保存为deletejunks.pl, 假设新语丝的Web根目录是/var/www/localhost/htdocs/,那么输入
./deletejunks.pl /var/www/localhost/htdocs/ 西风独自凉
列出所有西风的文章的前面几行接着是文章的完整路径,这时候可以检查检查,看看是不是有错误。
要删除的时候,输入:
./deletejunks.pl /var/www/localhost/htdocs/ 西风独自凉 2>deletelist && xargs -a deletelist rm
注意locale的设置,我的机器上设置的是zh_CN.gb2312

#!/usr/bin/perl
use encoding 'gb2312' , STDIN => 'gb2312', STDOUT => 'gb2312';

sub delete_by_author($) {
my($txtfile) = @_;
my($is_author) = 0;
my($nonblank_line_number) = 1;
open ARTICLE, "<:encoding(gb2312)", $txtfile;
do {
$line = <ARTICLE>;
if ($line !~ /^\s+$/) {
print $line;
$nonblank_line_number ++;
if ($line =~ /$ARGV[1]$/) {
print {STDERR} $txtfile,"\n";
}
}
## search the top 4 non-blank lines in the article
} while $nonblank_line_number < 5;
close ARTICLE;
}

sub recurse($) {
my($path) = @_;

## append a trailing / if it's not there
$path .= '/' if($path !~ /\/$/);

for my $eachFile ( glob($path.'*') ) {
## if the file is a directory
if( -d $eachFile) {
## pass the directory to the routine ( recursion )
recurse($eachFile);
} else {
#delete_by_author($eachFile) if($eachFile ~ /\.txt$/);
if($eachFile =~ /\.txt$/) { delete_by_author($eachFile); }
}
}
}

if (! $ARGV[1]) {die "Usage: deletejunks.pl BaseDirectory Author \n";}

## initial call ... $ARGV[0] is the first command line argument
recurse($ARGV[0]);




所有跟贴:


加跟贴

笔名: 密码: 注册笔名请按这里

标题:

内容: (BBCode使用说明