Author |
Message |
Mine GO BOOM Hunch Hunch What What

Age:41 Gender: Joined: Aug 01 2002 Posts: 3615 Location: Las Vegas Offline
|
Posted: Wed Oct 11, 2006 1:18 am Post maybe stupid Post subject: Googlebot Searching |
 |
|
|
|
It seems in the past month, Server Help got ranked much higher in their bot-looking algorithm. The bot visits here daily now and grabs lots more pages. Most of the time, it isn't a problem, but sometimes it comes from a couple different IPs and likes to grab lots of pages a second. The server that is hosting this site isn't that fancy (price was an important selling point), as you notice that sometimes pages take a while to load. Generally, this is because someone else on the same machine is trashing the hard drive pretty badly. Even though it is a virtual server and gets dedicated CPU time, the biggest bottleneck is still the hard drive. So if someone else is reading/writing tons of data, it takes little CPU time but the I/O wait is huge.
Before you could usually tell by the Server Load or create time load at the bottom of all the pages. See 50 pages served in last 5 minutes? Googlebot wanted 45 of them at the same time. Now, I added in a little bit of code to actually let you know that the bot is browsing the forums. All those locations that show who is currently online (front page, view online, view forums) will now display GoogleBot for each different IP it is coming from. See one, generally not a problem. See two or more, the site will be a bit slow for the next couple of minutes.
Some of you would probably suggest to just block the bot. Sure, it would prevent spammers and such from visiting (they'll still find it, and I try my best to add in protections), but the search power of Google is so much better than anything designed for PHPBB. Sometimes even I just use Google to search the forums instead of the built in search engine. All I really want is for the Googlebot to actually listen to the Crawl-Delay instead of trying to guess what they think is best.
Why am I posting this? Someone will notice it and probably make a new thread about it, I might as well do it up front instead of having others reply with LOL or other junk before I get a chance to explain. Want to do this yourself?
$host_ip = $row['session_ip'];
$host_ip = hexdec(substr($row['session_ip'], 0, 2));
$host_ip .= '.';
$host_ip .= hexdec(substr($row['session_ip'], 2, 2));
$host_ip .= '.';
$host_ip .= hexdec(substr($row['session_ip'], 4, 2));
$host_ip .= '.';
$host_ip .= hexdec(substr($row['session_ip'], 6, 2));
$hostname = gethostbyaddr($host_ip);
if (preg_match('/\.googlebot\.com$/', $hostname))
{
$username = "<font color=blue>G</font><font color=red>o</font><font color=red>o</font><font color=blue>g</font><font color=green>l</font><font color=red>e</font>Bot";
}
| Sure, the hex IP to dot IP looks ugly, but it is good enough. Its not like PHP is a speedy language in the first place. I included that for others who want to throw it in their own phpbb forums. Where to include it? Ugh, you guys will probably screw it up. Just download the patch and apply it. Don't know how? Do it manually, it is a unified diff, generally pretty easy to locate and import the stuff you need.
Unified diff output of GoogleBot user display thingamajig
googlebotphpbb.zip - 0.76 KB
File downloaded or viewed 24 time(s)
Last edited by Mine GO BOOM on Sat Oct 14, 2006 1:43 pm, edited 1 time in total |
|
Back to top |
|
 |
Mine GO BOOM Hunch Hunch What What

Age:41 Gender: Joined: Aug 01 2002 Posts: 3615 Location: Las Vegas Offline
|
Posted: Wed Oct 11, 2006 2:57 am Post maybe stupid Post subject: |
 |
|
|
|
In an attempt to fix GoogleBot's problem, I applied a sitemap mod. Very simple, it should hopefully tell GoogleBot to only crawl new pages, which may keep it out of looking at old ones tons and tons of times.
|
|
Back to top |
|
 |
Doc Flabby Server Help Squatter

Joined: Feb 26 2006 Posts: 636 Offline
|
Posted: Wed Oct 11, 2006 4:44 am Post maybe stupid Post subject: |
 |
|
|
|
some spamming bots pretend to be GoogleBot. it might not be google's fault  _________________ Rediscover online gaming. Get Subspace | STF The future...prehaps
|
|
Back to top |
|
 |
Mine GO BOOM Hunch Hunch What What

Age:41 Gender: Joined: Aug 01 2002 Posts: 3615 Location: Las Vegas Offline
|
Posted: Wed Oct 11, 2006 5:46 am Post maybe stupid Post subject: |
 |
|
|
|
This is a reverse DNS lookup not a user agent string. Technically, they can fake that and I should have it check the name to see if that matches the IP address, but this doesn't grant the bot any special access.
And I do know about faking as a GoogleBot. You'd be surprised, you can get free access to some sites by just changing your user agent string.
|
|
Back to top |
|
 |
Cyan~Fire I'll count you!

Age:37 Gender: Joined: Jul 14 2003 Posts: 4608 Location: A Dream Offline
|
Posted: Wed Oct 11, 2006 10:47 am Post maybe stupid Post subject: |
 |
|
|
|
LOL _________________ This help is informational only. No representation is made or warranty given as to its content. User assumes all risk of use. Cyan~Fire assumes no responsibility for any loss or delay resulting from such use.
Wise men STILL seek Him.
|
|
Back to top |
|
 |
Solo Ace Yeah, I'm in touch with reality...we correspond from time to time.

Age:37 Gender: Joined: Feb 06 2004 Posts: 2583 Location: The Netherlands Offline
|
Posted: Wed Oct 11, 2006 12:36 pm Post maybe stupid Post subject: |
 |
|
|
|
OLO
</3 Google
|
|
Back to top |
|
 |
Smong Server Help Squatter

Joined: 1043048991 Posts: 0x91E Offline
|
Posted: Wed Oct 11, 2006 2:07 pm Post maybe stupid Post subject: |
 |
|
|
|
 _________________ ss news
woo.png - 0.8 KB
File downloaded or viewed 14 time(s)
|
|
Back to top |
|
 |
D1st0rt Miss Directed Wannabe

Age:37 Gender: Joined: Aug 31 2003 Posts: 2247 Location: Blacksburg, VA Offline
|
Posted: Wed Oct 11, 2006 7:40 pm Post maybe stupid Post subject: |
 |
|
|
|
It's copying you dude
 _________________
|
|
Back to top |
|
 |
K' You can win any war if you start a year early

Gender: Joined: Jul 13 2006 Posts: 271 Location: Southtown Offline
|
Posted: Thu Oct 12, 2006 4:16 pm Post maybe stupid Post subject: |
 |
|
|
|
I been vigorously combating google in an attempt to web-wide IP block all of google's and its subsequent bots from accessing.
It's rough.
|
|
Back to top |
|
 |
Mine GO BOOM Hunch Hunch What What

Age:41 Gender: Joined: Aug 01 2002 Posts: 3615 Location: Las Vegas Offline
|
Posted: Thu Oct 12, 2006 4:20 pm Post maybe stupid Post subject: |
 |
|
|
|
Create a file called robots.txt and put it in the top-level folder for your domain (or subdomain) with the following:
User-agent: Googlebot
Disallow: / |
Or if you want to block every crawler: User-agent: *
Disallow: / |
|
|
Back to top |
|
 |
Solo Ace Yeah, I'm in touch with reality...we correspond from time to time.

Age:37 Gender: Joined: Feb 06 2004 Posts: 2583 Location: The Netherlands Offline
|
Posted: Fri Oct 13, 2006 11:38 am Post maybe stupid Post subject: |
 |
|
|
|
Haha, omgz I gotta block google's ips!!1
|
|
Back to top |
|
 |
Solo Ace Yeah, I'm in touch with reality...we correspond from time to time.

Age:37 Gender: Joined: Feb 06 2004 Posts: 2583 Location: The Netherlands Offline
|
Posted: Sat Sep 08, 2007 1:39 pm Post maybe stupid Post subject: |
 |
|
|
|
MGB, what the hell is up with your code?
$host_ip = $row['session_ip'];
$host_ip = hexdec(substr($row['session_ip'], 0, 2));
$host_ip .= '.';
$host_ip .= hexdec(substr($row['session_ip'], 2, 2));
$host_ip .= '.';
$host_ip .= hexdec(substr($row['session_ip'], 4, 2));
$host_ip .= '.';
$host_ip .= hexdec(substr($row['session_ip'], 6, 2)); |
It's exactly the same, it just looks better, but just use the predefined functions!
// script snippet from phpBB2/includes/functions.php
// line 343 - 353
function encode_ip($dotquad_ip)
{
$ip_sep = explode('.', $dotquad_ip);
return sprintf('%02x%02x%02x%02x', $ip_sep[0], $ip_sep[1], $ip_sep[2], $ip_sep[3]);
}
function decode_ip($int_ip)
{
$hexipbang = explode('.', chunk_split($int_ip, 2, '.'));
return hexdec($hexipbang[0]). '.' . hexdec($hexipbang[1]) . '.' . hexdec($hexipbang[2]) . '.' . hexdec($hexipbang[3]);
} |
|
|
Back to top |
|
 |
|