Server Help Forum Index Server Help
Community forums for Subgame, ASSS, and bots
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   StatisticsStatistics   RegisterRegister 
 ProfileProfile   Login to check your private messagesLogin to check your private messages   LoginLogin (SSL) 

Server Help | ASSS Wiki (0) | Shanky.com
Googlebot Searching

 
Post new topic   Reply to topic Printable version
 View previous topic  sscentral/ssforum have out done themse... Post :: Post Continuum Idenity of Pritt K  View next topic  
Author Message
Mine GO BOOM
Hunch Hunch
What What
Hunch Hunch<br>What What


Age:41
Gender:Gender:Male
Joined: Aug 01 2002
Posts: 3615
Location: Las Vegas
Offline

PostPosted: Wed Oct 11, 2006 1:18 am   Post maybe stupid    Post subject: Googlebot Searching Reply to topic Reply with quote

It seems in the past month, Server Help got ranked much higher in their bot-looking algorithm. The bot visits here daily now and grabs lots more pages. Most of the time, it isn't a problem, but sometimes it comes from a couple different IPs and likes to grab lots of pages a second. The server that is hosting this site isn't that fancy (price was an important selling point), as you notice that sometimes pages take a while to load. Generally, this is because someone else on the same machine is trashing the hard drive pretty badly. Even though it is a virtual server and gets dedicated CPU time, the biggest bottleneck is still the hard drive. So if someone else is reading/writing tons of data, it takes little CPU time but the I/O wait is huge.

Before you could usually tell by the Server Load or create time load at the bottom of all the pages. See 50 pages served in last 5 minutes? Googlebot wanted 45 of them at the same time. Now, I added in a little bit of code to actually let you know that the bot is browsing the forums. All those locations that show who is currently online (front page, view online, view forums) will now display GoogleBot for each different IP it is coming from. See one, generally not a problem. See two or more, the site will be a bit slow for the next couple of minutes.

Some of you would probably suggest to just block the bot. Sure, it would prevent spammers and such from visiting (they'll still find it, and I try my best to add in protections), but the search power of Google is so much better than anything designed for PHPBB. Sometimes even I just use Google to search the forums instead of the built in search engine. All I really want is for the Googlebot to actually listen to the Crawl-Delay instead of trying to guess what they think is best.

Why am I posting this? Someone will notice it and probably make a new thread about it, I might as well do it up front instead of having others reply with LOL or other junk before I get a chance to explain. Want to do this yourself?
Code: Show/Hide
$host_ip = $row['session_ip'];

$host_ip = hexdec(substr($row['session_ip'], 0, 2));
$host_ip .= '.';
$host_ip .= hexdec(substr($row['session_ip'], 2, 2));
$host_ip .= '.';
$host_ip .= hexdec(substr($row['session_ip'], 4, 2));
$host_ip .= '.';
$host_ip .= hexdec(substr($row['session_ip'], 6, 2));

$hostname = gethostbyaddr($host_ip);
if (preg_match('/\.googlebot\.com$/', $hostname))
{
   $username = "<font color=blue>G</font><font color=red>o</font><font color=red>o</font><font color=blue>g</font><font color=green>l</font><font color=red>e</font>Bot";
}
Sure, the hex IP to dot IP looks ugly, but it is good enough. Its not like PHP is a speedy language in the first place. I included that for others who want to throw it in their own phpbb forums. Where to include it? Ugh, you guys will probably screw it up. Just download the patch and apply it. Don't know how? Do it manually, it is a unified diff, generally pretty easy to locate and import the stuff you need.




Unified diff output of GoogleBot user display thingamajig

googlebotphpbb.zip - 0.76 KB
File downloaded or viewed 24 time(s)


Last edited by Mine GO BOOM on Sat Oct 14, 2006 1:43 pm, edited 1 time in total
Back to top
View users profile Send private message Add User to Ignore List Send email
Mine GO BOOM
Hunch Hunch
What What
Hunch Hunch<br>What What


Age:41
Gender:Gender:Male
Joined: Aug 01 2002
Posts: 3615
Location: Las Vegas
Offline

PostPosted: Wed Oct 11, 2006 2:57 am   Post maybe stupid    Post subject: Reply to topic Reply with quote

In an attempt to fix GoogleBot's problem, I applied a sitemap mod. Very simple, it should hopefully tell GoogleBot to only crawl new pages, which may keep it out of looking at old ones tons and tons of times.
Back to top
View users profile Send private message Add User to Ignore List Send email
Doc Flabby
Server Help Squatter


Joined: Feb 26 2006
Posts: 636
Offline

PostPosted: Wed Oct 11, 2006 4:44 am   Post maybe stupid    Post subject: Reply to topic Reply with quote

some spamming bots pretend to be GoogleBot. it might not be google's fault tongue.gif
_________________
Rediscover online gaming. Get Subspace | STF The future...prehaps
Back to top
View users profile Send private message Add User to Ignore List
Mine GO BOOM
Hunch Hunch
What What
Hunch Hunch<br>What What


Age:41
Gender:Gender:Male
Joined: Aug 01 2002
Posts: 3615
Location: Las Vegas
Offline

PostPosted: Wed Oct 11, 2006 5:46 am   Post maybe stupid    Post subject: Reply to topic Reply with quote

This is a reverse DNS lookup not a user agent string. Technically, they can fake that and I should have it check the name to see if that matches the IP address, but this doesn't grant the bot any special access.

And I do know about faking as a GoogleBot. You'd be surprised, you can get free access to some sites by just changing your user agent string.
Back to top
View users profile Send private message Add User to Ignore List Send email
Cyan~Fire
I'll count you!
I'll count you!


Age:37
Gender:Gender:Male
Joined: Jul 14 2003
Posts: 4608
Location: A Dream
Offline

PostPosted: Wed Oct 11, 2006 10:47 am   Post maybe stupid    Post subject: Reply to topic Reply with quote

LOL
_________________
This help is informational only. No representation is made or warranty given as to its content. User assumes all risk of use. Cyan~Fire assumes no responsibility for any loss or delay resulting from such use.
Wise men STILL seek Him.
Back to top
View users profile Send private message Add User to Ignore List Visit posters website
Solo Ace
Yeah, I'm in touch with reality...we correspond from time to time.


Age:37
Gender:Gender:Male
Joined: Feb 06 2004
Posts: 2583
Location: The Netherlands
Offline

PostPosted: Wed Oct 11, 2006 12:36 pm   Post maybe stupid    Post subject: Reply to topic Reply with quote

OLO

</3 Google
Back to top
View users profile Send private message Add User to Ignore List
Smong
Server Help Squatter


Joined: 1043048991
Posts: 0x91E
Offline

PostPosted: Wed Oct 11, 2006 2:07 pm   Post maybe stupid    Post subject: Reply to topic Reply with quote


_________________
ss news




woo.png - 0.8 KB
File downloaded or viewed 14 time(s)
Back to top
View users profile Send private message Add User to Ignore List Visit posters website MSN Messenger
D1st0rt
Miss Directed Wannabe


Age:37
Gender:Gender:Male
Joined: Aug 31 2003
Posts: 2247
Location: Blacksburg, VA
Offline

PostPosted: Wed Oct 11, 2006 7:40 pm   Post maybe stupid    Post subject: Reply to topic Reply with quote

It's copying you dude


_________________

Back to top
View users profile Send private message Add User to Ignore List Visit posters website
K'
You can win any war if you start a year early


Gender:Gender:Male
Joined: Jul 13 2006
Posts: 271
Location: Southtown
Offline

PostPosted: Thu Oct 12, 2006 4:16 pm   Post maybe stupid    Post subject: Reply to topic Reply with quote

I been vigorously combating google in an attempt to web-wide IP block all of google's and its subsequent bots from accessing.
It's rough.
Back to top
View users profile Send private message Add User to Ignore List
Mine GO BOOM
Hunch Hunch
What What
Hunch Hunch<br>What What


Age:41
Gender:Gender:Male
Joined: Aug 01 2002
Posts: 3615
Location: Las Vegas
Offline

PostPosted: Thu Oct 12, 2006 4:20 pm   Post maybe stupid    Post subject: Reply to topic Reply with quote

Create a file called robots.txt and put it in the top-level folder for your domain (or subdomain) with the following:
Code: Show/Hide
User-agent: Googlebot
Disallow: /

Or if you want to block every crawler:
Code: Show/Hide
User-agent: *
Disallow: /
Back to top
View users profile Send private message Add User to Ignore List Send email
Solo Ace
Yeah, I'm in touch with reality...we correspond from time to time.


Age:37
Gender:Gender:Male
Joined: Feb 06 2004
Posts: 2583
Location: The Netherlands
Offline

PostPosted: Fri Oct 13, 2006 11:38 am   Post maybe stupid    Post subject: Reply to topic Reply with quote

Haha, omgz I gotta block google's ips!!1
Back to top
View users profile Send private message Add User to Ignore List
Solo Ace
Yeah, I'm in touch with reality...we correspond from time to time.


Age:37
Gender:Gender:Male
Joined: Feb 06 2004
Posts: 2583
Location: The Netherlands
Offline

PostPosted: Sat Sep 08, 2007 1:39 pm   Post maybe stupid    Post subject: Reply to topic Reply with quote

MGB, what the hell is up with your code?

Code: Show/Hide
$host_ip = $row['session_ip'];

$host_ip = hexdec(substr($row['session_ip'], 0, 2));
$host_ip .= '.';
$host_ip .= hexdec(substr($row['session_ip'], 2, 2));
$host_ip .= '.';
$host_ip .= hexdec(substr($row['session_ip'], 4, 2));
$host_ip .= '.';
$host_ip .= hexdec(substr($row['session_ip'], 6, 2));


It's exactly the same, it just looks better, but just use the predefined functions!

Code: Show/Hide
// script snippet from phpBB2/includes/functions.php
// line 343 - 353
function encode_ip($dotquad_ip)
{
   $ip_sep = explode('.', $dotquad_ip);
   return sprintf('%02x%02x%02x%02x', $ip_sep[0], $ip_sep[1], $ip_sep[2], $ip_sep[3]);
}

function decode_ip($int_ip)
{
   $hexipbang = explode('.', chunk_split($int_ip, 2, '.'));
   return hexdec($hexipbang[0]). '.' . hexdec($hexipbang[1]) . '.' . hexdec($hexipbang[2]) . '.' . hexdec($hexipbang[3]);
}
Back to top
View users profile Send private message Add User to Ignore List
Display posts from previous:   
Post new topic   Reply to topic    Server Help Forum Index -> Trash Talk All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum
View online users | View Statistics | View Ignored List


Software by php BB © php BB Group
Server Load: 69 page(s) served in previous 5 minutes.

phpBB Created this page in 0.570645 seconds : 38 queries executed (84.2%): GZIP compression disabled