How to Replace WordPress Post Links in Bulk

Recently, I wanted to move several of my existing WordPress blogs into a single WordPress multi-site setup. After doing so, I realized that all of my image links were broken because WordPress multi-site has a different organization for their media files.

In particular, images no longer reside in the wp-content/uploads directory, but rather in the files directory.

I can create a 301 redirect to upgrade all of my old links, but I would prefer to change all of my old link structures so that they point to the new image file location.

Here we consider how to replace WordPress post and page links in bulk using PHP.

1. Extract HTML and Image Links

First, we use regular expressions to extract HTML and image links from our WordPress post and page content.

Get HTML Links

function _get_html_links($content) {
	preg_match_all('/<a [^>]+>/i',$content, $links);

	$result = array();		
	foreach( $links[0] as $link_tag ) {
		preg_match_all('/(href)=("[^"]*")/i',$link_tag, $result[$link_tag]);
	}
	return $result;
}

Line 2 – Extract all HTML links, i.e., content that is encapsulated within the pattern <a and >.
Line 6 – Extract all relevant link tags from each HTML link.

Example results for HTML links look like this -

$links = Array ( [0] => Array (
            [0] => <a href="http://akismet.com/"> 
            [1] => <a href="http://automattic.com/wordpress-plugins/"> 
            [2] => <a href="http://wordpress.org/"> 
        )
)

$result = Array (
[<a href="http://akismet.com/">] => 
Array (   [0] => Array (
                    [0] => href="http://akismet.com/" )
 
            [1] => Array (
                    [0] => href )
 
            [2] => Array (
                    [0] => "http://akismet.com/" )
         )
 
[<a href="http://automattic.com/wordpress-plugins/">] => 
Array (
            [0] => Array (
                    [0] => href="http://automattic.com/wordpress-plugins/" )
 
            [1] => Array (
                    [0] => href )
 
            [2] => Array (
                    [0] => "http://automattic.com/wordpress-plugins/" ) 
        ) ...
)

Get Image Links

function _get_image_links($content) {		
	preg_match_all('/<img [^>]+>/i',$content, $image_links);

	$result = array();
	foreach( $image_links[0] as $image_tag) {
		preg_match_all('/(alt|title|src|width|height)=("[^"]*")/i',$image_tag, $result[$image_tag]);
	}
	return $result;
}

Line 2 – Extract all image links, i.e., content that is encapsulated within the pattern <img and >.
Line 6 – Extract all relevant image tags from each image link.

Example results for image links look like this -

$image_links = Array ( [0] => Array (
            [0] => <img src="http://shibashake.com/test-site/wp-content/uploads/2010/07/Music_Singer_Footer7-280x93.jpg" alt="" title="Music_Singer_Footer7" width="280" height="93" class="alignright size-medium wp-image-2666" /> 
            [1] => <img src="http://shibashake.com/test-site/wp-content/uploads/2010/07/Music_Singer_Bottom7-109x360.jpg" alt="Testing image insertion" title="Music_Singer_Bottom7" width="109" height="360" class="size-medium wp-image-2665" /> 
        ) 
)

$result = Array (
    [<img src="http://shibashake.com/test-site/wp-content/uploads/2010/07/Music_Singer_Footer7-280x93.jpg" alt="" title="Music_Singer_Footer7" width="280" height="93" class="alignright size-medium wp-image-2666" />] => 
Array (
            [0] => Array (
                    [0] => src="http://shibashake.com/test-site/wp-content/uploads/2010/07/Music_Singer_Footer7-280x93.jpg"
                    [1] => alt=""
                    [2] => title="Music_Singer_Footer7"
                    [3] => width="280"
                    [4] => height="93" )
 
            [1] => Array (
                    [0] => src
                    [1] => alt
                    [2] => title
                    [3] => width
                    [4] => height )
 
            [2] => Array (
                    [0] => "http://shibashake.com/test-site/wp-content/uploads/2010/07/Music_Singer_Footer7-280x93.jpg"
                    [1] => ""
                    [2] => "Music_Singer_Footer7"
                    [3] => "280"
                    [4] => "93" )
        ) ...
)

2. Process and Replace a Single Link

The process_link function below takes in 3 arguments.

  • $link_replace = Link replacement array of the form -
  • array( 'www.shibashake.com' => 'shibashake.com',
           'wp-content/uploads' => 'files' )
    

    This will replace www.shibashake.com with shibashake.com and wp-content/uploads with files.

  • $link_html = The full link HTML code.
  • $tag_array = Array of link tags as returned by the _get_html_links and _get_image_links functions in step 1.
function process_link($link_replace, $link_html, $tag_array) {
        global $tmp_post_content, $has_new_content;
	if (!count($tag_array[1]) || !count($tag_array[2])) return NULL;
	$link_attr = array_combine($tag_array[1], $tag_array[2]);
	$link_attr = array_change_key_case($link_attr);
		
	$link = isset($link_attr['src']) ? trim($link_attr['src'],"\"'") : trim($link_attr['href'],"\"'");			
	if (!$link) return NULL;

	// Replace link code here
	$new_html = $link_html;
	foreach ($link_replace as $key => $value) {
		if (strpos($link, $key) !== FALSE) {
			$new_html = str_replace($key, $value, $new_html);
		}
	}

	if ($new_html != $link_html) {
		$tmp_post_content = str_replace($link_html, $new_html, $tmp_post_content);
		$has_new_content = TRUE;					
	}
	return $new_html;
}

Lines 3-5 – Create an associative array of the link tags and values obtained in step 1. For example –

Array (   ['src'] => "http://shibashake.com/test-site/wp-content/uploads/2010/07/Music_Singer_Footer7-280x93.jpg",
          ['alt'] => "",
          ['title'] => "Music_Singer_Footer7",
          ['width'] => "280",
          ['height'] => "93" )

Lines 7 – Trim quotation marks from the link (src or href).
Lines 8 – If there is no link entry in the array of tags then return from the link replacement function.
Lines 11-16 – Create a new link by replacing elements of our current link with their corresponding values in our replacement array ($link_replace).
Lines 18-21 -Replace our old link with our newly created link in a temporary post content area ($tmp_post_content). This global variable will be used to store all the link changes for a given post. Once all the links within a post has been replaced, we can update the current post content with $tmp_post_content. The global variable $has_new_content indicates whether the current post content has to be updated.

3. Process and Replace Multiple Links within a Single Post

The process_content_links function below takes in 3 arguments.

  • $post = Replace links for this post object.
  • $link_replace = Link replacement array.
  • $img_links = Indicates whether we want to process image links (TRUE) or HTML links (FALSE).
	
function process_content_links($post, $link_replace, $img_links=TRUE) {
       global $tmp_post_content, $has_new_content;
		
	$has_new_content = FALSE;
	$tmp_post_content = $post->post_content;
	// get links and link information
	if ($img_links)
		$links = _get_image_links($tmp_post_content);
	else
		$links = _get_content_links($tmp_post_content);
			
	// Go through each html tag 
	$result = array(); 
	foreach($links as $link_html => $tag_array) {
		$link_result = process_link($post, $link_replace, $link_html, $tag_array, $img_links);
		if ($link_result)
			$result[] = $link_result;
	} // end foreach 

	if ($has_new_content) {// update post content
		echo "<!-- {$post->post_title} - Updating post with new content -->";
		global $wpdb;
		// write post content DIRECTLY into database so that it does not go through regular filters which may remove iframes etc.
//		$wpdb->update( 	$wpdb->posts, array( 'post_content' => $tmp_post_content ), array('ID' => $post->ID), 
//				array('%s'), array('%d') );
	}			
	return $result;
}

Lines 4-5 – Initialize the global variables $tmp_post_content and $has_new_content. $tmp_post_content stores all link changes for a given post. $has_new_content indicates whether the current post content has to be updated with $tmp_post_content.
Lines 6-10 – Obtain links and link tags using the functions defined in step 1.
Lines 12-18 – Replace each link according to the elements in our replacement array ($link_replace). Here, we use the process_link function defined in step 2.
Lines 20-26 – Update the given post if its content has changed.

WARNING – Only update the given post after you have made sure that there are no bugs in your code and $tmp_post_content contains the right post links and content.

Line 27 – Return an array of new links.

4. Repeat Link Replacement for All Posts and Pages

Finally, we only need to repeat step 3 for all of our posts and/or pages.

function replace_links($link_replace, $img_links=TRUE) {
	// generate links for posts
	$args = array(
		'post_type' => array('post','page'),
		'numberposts' => -1,
		'post_status' => null,
		'post_parent' => null, // any parent
		); 
	$posts = get_posts($args);
	foreach ($posts as $post) {
		$result = process_content_links($post, $link_replace, $img_links); 
		echo "<!-- ";
		print_r($result);
		echo "-->";			
	}	
}

Lines 3-9 – Get all current posts and pages.
Lines 10-15 – Iterate through each post or page, and replace all links using the process_content_links function defined in step 3.

5. We Are Done!

Make sure to only update your posts ($wpdb->update), after you have fully debugged your code, and are sure that there are no errors in your replacement links. This will protect you from losing data or losing work.

First, only start by updating single posts. Only iterate through multiple posts if you are sure that everything is in 100% working order.

DO NOT attempt this if you are uncomfortable with PHP or do not understand the code given in this tutorial.

If you are just interested in doing a text replace in your SQL database, the command would look something like this-

UPDATE wp_posts SET post_content = REPLACE(post_content,'www.domain.com','www.newdomain.com');

Related Articles

Comments

  1. moepstr says

    Hello,

    your post is really the first one i’ve found that deals with this particular problem.

    Mine however seems to be a bit worse: not only did i have /wp-content/uploads before but i didn’t have any kind of year/month based structure.

    So, now, i need to transform from /wp-content/uploads to /files/year/month in a multisite environment…

    Do you have an idea on how to tackle that?

    Also, a follow-up post to this one – where everything is laid out a little better explained on how to put those various code-snippets together – would be _much_ appreciated :)

  2. says

    I agree with first comment. This is the second time I’ve come across your guides, the first time was with your guide to adding custom columns to WordPress post editing screens, which was one of the best guides to doing this that I found anywhere and made it easy for me to do something I thought would be horrific.

    I haven’t done anything replacing any links in bulk yet. What I’m wondering is if it has applications for adding java onclick events to links at pageload time for monitoring in Google Analytics. Or would this slow a site down too much?

    I’m guessing some plugins do something similar – e.g. Yoast’s GA plugin has a feature for adding onclick event handling to specified internal links. But maybe the plugin does something more complex than I imagine.

    Thoughts welcome. Thanks again for the great guides!

  3. __B__ says

    It’s a shame there isn’t a single comment about this really great post. I’ve been looking for a guide like yours for some time. Keep the good work!!!!

Trackbacks

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>