PHP屏蔽蜘蛛訪問代碼及常用搜索引擎的HTTP_USER_AGENT
PHP屏蔽蜘蛛訪問代碼代碼:

常用搜索引擎名與 HTTP_USER_AGENT對(duì)應(yīng)值
百度baiduspider
谷歌googlebot
搜狗sogou
騰訊SOSOsosospider
雅虎slurp
有道youdaobot
Bingbingbot
MSNmsnbot
Alexais_archiver
function is_crawler() {
$userAgent = strtolower($_SERVER['HTTP_USER_AGENT']);
$spiders = array(
'Googlebot', // Google 爬蟲
'Baiduspider', // 百度爬蟲
'Yahoo! Slurp', // 雅虎爬蟲
'YodaoBot', // 有道爬蟲
'msnbot' // Bing爬蟲
// 更多爬蟲關(guān)鍵字
);
foreach ($spiders as $spider) {
$spider = strtolower($spider);
if (strpos($userAgent, $spider) !== false) {
return true;
}
}
return false;
}
下面的php代碼附帶了更多的蜘蛛標(biāo)識(shí)
function isCrawler() {
echo $agent= strtolower($_SERVER['HTTP_USER_AGENT']);
if (!empty($agent)) {
$spiderSite= array(
"TencentTraveler",
"Baiduspider+",
"BaiduGame",
"Googlebot",
"msnbot",
"Sosospider+",
"Sogou web spider",
"ia_archiver",
"Yahoo! Slurp",
"YoudaoBot",
"Yahoo Slurp",
"MSNBot",
"Java (Often spam bot)",
"BaiDuSpider",
"Voila",
"Yandex bot",
"BSpider",
"twiceler",
"Sogou Spider",
"Speedy Spider",
"Google AdSense",
"Heritrix",
"Python-urllib",
"Alexa (IA Archiver)",
"Ask",
"Exabot",
"Custo",
"OutfoxBot/YodaoBot",
"yacy",
"SurveyBot",
"legs",
"lwp-trivial",
"Nutch",
"StackRambler",
"The web archive (IA Archiver)",
"Perl tool",
"MJ12bot",
"Netcraft",
"MSIECrawler",
"WGet tools",
"larbin",
"Fish search",
);
foreach($spiderSite as $val) {
$str = strtolower($val);
if (strpos($agent, $str) !== false) {
return true;
}
}
} else {
return false;
}
}
if (isCrawler()){
echo "你好蜘蛛精!";
}
else{
echo "你不是蜘蛛精??!";
}
使用PHP實(shí)現(xiàn)蜘蛛訪問日志統(tǒng)計(jì)
$useragent = addslashes(strtolower($_SERVER['HTTP_USER_AGENT']));
if (strpos($useragent, 'googlebot')!== false){$bot = 'Google';}
elseif (strpos($useragent,'mediapartners-google') !== false){$bot = 'Google Adsense';}
elseif (strpos($useragent,'baiduspider') !== false){$bot = 'Baidu';}
elseif (strpos($useragent,'sogou spider') !== false){$bot = 'Sogou';}
elseif (strpos($useragent,'sogou web') !== false){$bot = 'Sogou web';}
elseif (strpos($useragent,'sosospider') !== false){$bot = 'SOSO';}
elseif (strpos($useragent,'360spider') !== false){$bot = '360Spider';}
elseif (strpos($useragent,'yahoo') !== false){$bot = 'Yahoo';}
elseif (strpos($useragent,'msn') !== false){$bot = 'MSN';}
elseif (strpos($useragent,'msnbot') !== false){$bot = 'msnbot';}
elseif (strpos($useragent,'sohu') !== false){$bot = 'Sohu';}
elseif (strpos($useragent,'yodaoBot') !== false){$bot = 'Yodao';}
elseif (strpos($useragent,'twiceler') !== false){$bot = 'Twiceler';}
elseif (strpos($useragent,'ia_archiver') !== false){$bot = 'Alexa_';}
elseif (strpos($useragent,'iaarchiver') !== false){$bot = 'Alexa';}
elseif (strpos($useragent,'slurp') !== false){$bot = '雅虎';}
elseif (strpos($useragent,'bot') !== false){$bot = '其它蜘蛛';}
if(isset($bot)){
$fp = @fopen('bot.txt','a');
fwrite($fp,date('Y-m-d H:i:s')."\t".$_SERVER["REMOTE_ADDR"]."\t".$bot."\t".'http://'.$_SERVER['SERVER_NAME'].$_SERVER["REQUEST_URI"]."\r\n");
fclose($fp);
}
相關(guān)文章
PHP中的str_repeat函數(shù)在JavaScript中的實(shí)現(xiàn)
PHP中有一個(gè)函數(shù):String str_repeat($str, num);挺好用的,在 本文為大家介紹下次函數(shù)在js中的實(shí)現(xiàn),感興趣的朋友可以參考下2013-09-09
PHP實(shí)現(xiàn)讀取Excel文件的記錄(二)
在前文中介紹的方法有些麻煩,因?yàn)楸仨氁虞d很多的文件。本文介紹的方法簡(jiǎn)單了很多,只需要加載兩個(gè)文件即可。需要的可以參考一下2022-03-03
PHP實(shí)現(xiàn)求連續(xù)子數(shù)組最大和問題2種解決方法
這篇文章主要介紹了PHP實(shí)現(xiàn)求連續(xù)子數(shù)組最大和問題2種解決方法,涉及php針對(duì)數(shù)組的遍歷、判斷、運(yùn)算等相關(guān)操作技巧,需要的朋友可以參考下2017-12-12
PHP實(shí)現(xiàn)防重復(fù)提交(防抖)的方法總結(jié)
當(dāng)涉及到處理表單提交或用戶點(diǎn)擊按鈕等操作時(shí),防抖(Debounce)是一種重要的技術(shù),它可以有效地防止不必要的重復(fù)操作,本文為大家整理了 PHP 中防抖的多種實(shí)現(xiàn)方法,需要的可以參考下2023-09-09
LINUX下PHP程序?qū)崿F(xiàn)WORD文件轉(zhuǎn)化為PDF文件的方法
這篇文章主要介紹了LINUX下PHP程序?qū)崿F(xiàn)WORD文件轉(zhuǎn)化為PDF文件的方法,涉及php針對(duì)Word文檔與pdf格式文件的相關(guān)操作技巧,需要的朋友可以參考下2016-05-05
php使用str_replace替換多維數(shù)組的實(shí)現(xiàn)方法分析
這篇文章主要介紹了php使用str_replace替換多維數(shù)組的實(shí)現(xiàn)方法,結(jié)合具體實(shí)例對(duì)比分析了php針對(duì)多維數(shù)組的遍歷與替換操作相關(guān)實(shí)現(xiàn)技巧與注意事項(xiàng),需要的朋友可以參考下2017-06-06

