This is a step by step guide on how to use PHP to retrieve your HubPages comments and import them as WordPress comments. There are three key steps involved in this process –
- Retrieving your HubPages comments from a HTML page.
- Writing out those comments in the WordPress XML format.
- Importing the XML file into WordPress.
We will only focus on retrieving HubPages comments using PHP in this article.
To do this yourself, you are going to need access to a server that is capable of executing PHP scripts. If you do not have access to such a server, then check out A Simple Way for Importing HubPages Comments into WordPress.
Retrieving HubPages Comments
To retrieve your HubPages comments, you need to write a HTML text scraper. The first thing you want to do is open up the target hub, and read its contents into a string that you can then parse.
This can be achieved with CURL commands or simple native PHP commands. For now, let us use the simpler PHP command.
1. Open HubPages File
$str = file_get_contents("http://hubpages.com/hub/HUB-NAME"); if (!$str) { echo "Cannot open file\n"; exit; }
Once we have successfully opened the file, we can start parsing the HTML text within it. To do this, you want to easily locate certain HTML patterns within the HubPages file and retrieve text between a startPattern and an endPattern.
For example we may want to retrieve the title of the HTML file. To do this, the startPattern we can try to look for is <TITLE> and the endPattern we can try to look for is </TITLE>. Since this operation is something that we must repeat many times while processing the HubPages file, it is best to specify it as a PHP function.
2. Get Text Between startPattern and endPattern
function getData($startPattern, $endPattern) { global $pos, $str; $pos = strpos($str, $startPattern, $pos); if($pos === false) { $pos = strlen($str); return ""; } $pos = $pos + strlen($startPattern); $temppos = $pos; $pos = strpos($str, $endPattern, $pos); $datalength = $pos - $temppos; $data = substr($str, $temppos , $datalength); return $data; }
This function returns the text between startPattern and endPattern. If startPattern is not found within the HubPages file, then the function returns an empty string.
Also note that the function uses two globals, $str and $pos. $str is the HubPages file you are processing and $pos is the current pointer within that file. This is similar to using your finger to point while reading a book.
$pos indicates to the function where to start looking for the startPattern, i.e. start from where your finger is pointing and only look for startPattern from then on. This will ignore all occurrences of startPattern before your finger.
Now you are ready to extract the data you want from your HubPages files. For example,
$title = getData('<title>', '</title>');
will store the TITLE of the HTML document in $title.
$title = getData('<title>', '</title>'); // Point to the start of the comments section $pos = strpos($str, 'class="module moduleComment"'); if($pos === false) { echo "No comments available"; } else { $maxlen = strlen($str); // While there are still comments while ($pos < $maxlen) { $status = getData('Status: ', '</div>'); if ($pos == $maxlen) break; $IP = getData('<span>ip: ', '</span>'); $authorStr = getData('class="comment_meta"><strong>', ' says:</strong>'); $dateStr = getData('<small>', '</small>'); $comment = getData('<p>', '</p></div>'); } }
The PHP code fragment above will retrieve the status, IP address, author, date, and text of your HubPages comments. Note that status and IP address information are only available if you are logged into your HubPages account from the server you are using to execute the PHP script. More about this later.
In the meantime, we are not done yet because the author information we have retrieved may be a HTML link, e.g.
<a href="http://www.shibashake.com"/>ShibaShake</a>
We want to process this string and separate the author name, from the website that it is associated with.
Get Author Name and Link
$authorStr = getData('class="comment_meta"><strong>', ' says:</strong>'); $link = getDataFromStr($authorStr, '<a href="', '">'); if($link == null) { $author = $authorStr; } else { $author = getDataFromStr($authorStr, '>', '</a>'); }
Note that getDataFromStr is similar to our previous getData function except that instead of operating on the global HubPages file, it just operates on a simple string that you pass into the function.
The PHP code for getDataFromStr is essentially the same as the getData function except that there are no global variables, and our finger pointer always starts at the beginning of the string.
function getDataFromStr($str, $startPattern, $endPattern) { $pos = strpos($str, $startPattern); if($pos === false) { return ""; } $pos = $pos + strlen($startPattern); $temppos = $pos; $pos = strpos($str, $endPattern, $pos); $datalength = $pos - $temppos; $data = substr($str, $temppos , $datalength); return $data; }
Now we are almost done. The last thing we must do is convert the date string.
HubPages comments uses dates in the number-of-day-ago format. Rather than showing a date, e.g. 24th July 2009, HubPages instead shows 2 days ago, 3 weeks ago, or 1 month ago.
The PHP function strtotime allows you to easily convert this English into an actual date. Note however, that for older comments, you will lose the actual day where the comment occurred. All you will have is the week or month when the comment was made.
In order to make sure that your comments are still properly ordered within WordPress, you must also insert a time offset, so that all the comments that occurred the-same-months ago are organized based on oldest first.
function translateDate($dateStr) { static $offset = 0; $time = strtotime($dateStr); $time = $time + ($offset*60); $offset++; return date('Y-m-d H:i:s', $time); }
IP Address Information
Note – As mentioned previously, the status and IP address information will only be available if you are logged in to your HubPages account. If you do not want to deal with this extra step, and do not particularly care about the IP address information in your comments, then just remove those lines of code and skip this section.
If you want to retrieve status and IP address information, there are two ways to achieve this –
Simple method –
- Log into your HubPages account.
- Go to the HubPages file that you want to retrieve the comments from.
- Save the HTML file onto your local computer.
- Upload the HTML file onto your server.
- Point your PHP script to the HTML file you just loaded.
Automatic method –
The more elegant way to get this additional information is to temporarily log into your HubPages account from your PHP script. This is most easily done with CURL commands.
Note that frequent log-ins and logouts from your HubPages account may have a bad effect on your Hub Score.
That’s It!
Running this PHP script will allow you to extract all the relevant information on your HubPages comments.
Now, you only need to write out this data in the WordPress XML file format so that you can import it into WordPress.
Also be aware that the script above is based on the current HubPages HTML file format (as of 24th July 2009). If HubPages decides to change the structure of their hubs, you will likely need to update the startPattern and endPattern-s used to extract the relevant HubPages comment information.