Server Help

Non-Subspace Related Coding - Basic winsock trouble getting correct source

Anonymous - Mon Aug 13, 2012 12:45 am
Post subject: Basic winsock trouble getting correct source
Hi i'm trying to make a simple program to get first 512 characters from source page of macys webpage.

simple code
Code: Show/Hide

   string url = "GET http://www1.macys.com/shop/product/?ID=";
   url += argv[1]; // Add product id to URL
   url += "\r\nHTTP/1.1\r\n";
    url += "Host: www1.macys.com\r\n";
    url += "Connection: close\r\n";

   char buf[BUFFSIZE]; // 512

   send(sock,url.c_str(),url.length(),0); // Send socket

recv(sock,buf,BUFFSIZE,0);



The problem is the following. Lets say products web id is 546916 so url would be http://www1.macys.com/shop/product/?ID=546916

The source i recieve is
Code: Show/Hide


            <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
            <html xmlns:jsp="http://java.sun.com/products/jsp/dtd/jsp_1_0.dtd" lang="en-us">
<head>
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<META NAME="ROBOTS" CONTENT="NOINDEX"><!-- o -->
<title>Not Found - Macy's</title>
<meta http-equiv="generator" content="JACPKMALPHTCSJDTCR">

<script type="text/javascript" language="JavaScript">
<!--
                              function getForwardPageURL()
                              ????????


While infact if you put the URL into the web browser the source should be

Code: Show/Hide
<!DOCTYPE html>
<html lang="en-us">
<head>
<title>JS Boutique Dress, One Shoulder Draped Evening Dress - Womens Dresses - Macy's</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta http-equiv="generator" content="JACPKMALPHTCSJDTCR" />


The meta name robots in the source that my program recieves makes me feel macys did something. what can i do? please help.
Cheese - Wed Aug 15, 2012 1:38 pm
Post subject:
user agents are different, and the content it sends is different accordingly


also your get is wrong, thats a 404 page
Anonymous - Tue Sep 04, 2012 4:14 pm
Post subject:
kinda pointless to post but just figured if any one comes across the problem (4e-60000% chance) but its google which after sending about 700 requests to google search it asks for a captcha :\
All times are -5 GMT
View topic
Powered by phpBB 2.0 .0.11 © 2001 phpBB Group