Multiple cURL Requests with PHP

(jump to the practical example)

PHP has a set of cURL functions to let your script download other webpages. If you use cURL to scrape data or build mashups, you may need to fetch more than one page. This could create a massive performance problem, adding seconds to your own script's runtime because you have to wait for several individual cURL requests to come back.

Enter curl_multi_init. This family of functions allows you to combine cURL handles and execute them simultaneously.

    // this example does NOT use simultaneous requests, it must wait for each response
    
    // request 1
    $ch = curl_init('http://webservice.one.com/');  // initialize the request
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // store the page contents
    $response_1 = curl_exec($ch);                   // actually make the request
    
    // request 2
    $ch = curl_init('http://webservice.two.com/');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $response_2 = curl_exec($ch);
    
    // normally you would process your results here
    echo "$response_1 $response_2";
    // with curl_multi, you only have to wait for the longest-running request
    
    // build the individual requests as above, but do not execute them
    $ch_1 = curl_init('http://webservice.one.com/');
    $ch_2 = curl_init('http://webservice.two.com/');
    curl_setopt($ch_1, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch_2, CURLOPT_RETURNTRANSFER, true);
    
    // build the multi-curl handle, adding both $ch
    $mh = curl_multi_init();
    curl_multi_add_handle($mh, $ch_1);
    curl_multi_add_handle($mh, $ch_2);
    
    // execute all queries simultaneously, and continue when all are complete
    $running = null;
    do {
        curl_multi_exec($mh, $running);
    } while ($running);
    
    // all of our requests are done, we can now access the results
    $response_1 = curl_multi_getcontent($ch_1);
    $response_2 = curl_multi_getcontent($ch_2);
    echo "$response_1 $response_2"; // same output as first example
    

If both websites take one second to return, we literally cut our page load time in half by using the second example instead of the first. Sweet!

In Action: Twitter

Here's an example where we run multiple Twitter searches and combine the results to display them on our own site.

As a bonus, it also caches the results for 1 minute so we avoid hitting Twitter's rate limit if we get a ton of visitors at the same time. You can change $minutes to any number you feel comfortable with, but it's important to include because you will end up with a complete blank list if your page gets a lot of hits, which is precisely the worst time to kill your content.

function tweets() {
    
    // check cache
    $cache = 'twitter-search.txt';
    if (file_exists($cache)) {
        clearstatcache();
        $minutes = 1; // how long to wait before refreshing the cache
        if (filemtime($cache) > (time() - (60 * $minutes)) {
            return file_get_contents($cache);
        }
    }
    
    // we are going to search for tweets mentioning these keywords
    $keywords = array(
        'javascript',
        'html5',
        'css3'
    );
    
    // build the requests
    $ch = array();
    $mh = curl_multi_init();
    for ($i = 0; $i < count($keywords); $i++) {
        $keyword = $keywords[$i];
        $ch[$i] = curl_init();
        curl_setopt($ch[$i], CURLOPT_URL, 
                'http://search.twitter.com/search.json?rpp=3&q=' . $keyword);
        curl_setopt($ch[$i], CURLOPT_USERAGENT, 
                'Twitter requires you to set a user agent, any value works here.');
        curl_setopt($ch[$i], CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch[$i], CURLOPT_HEADER, false);
        curl_multi_add_handle($mh, $ch[$i]);
    }
    
    // execute the requests simultaneously
    $running = 0;
    do {
        curl_multi_exec($mh, $running);
    } while ($running > 0);
    
    // display the results
    $output = '';
    for ($i = 0; $i < count($keywords); $i++) {
        // $results contains this keyword's tweets as an associative array
        $results = reset(json_decode(curl_multi_getcontent($ch[$i]), true));
        $resultCount = count($results);
        
        // link to our keyword
        $output .= '<dl><dt><a href="http://search.twitter.com/search?' . $keywords[$i] . '</a></dt>';
        
        // dump the search results
        for ($j = 0; $j < $resultCount; $j++) {
            $id = $results[$j]['id'];                          // twitter user ID
            $user = $results[$j]['from_user'];                 // twitter user name
            $tweet = $results[$j]['text'];                     // tweet text
            $url = "http://www.twitter.com/$user/status/$id/"; // link to the tweet
            
            $output .= '<a href="' . $url . '">' . $tweet . ' &mdash; ' . $user . '</a>';
        }
        $output .= '</dd></dl>';
    }
    file_put_contents($cache, $output); // store in local cache for performance boost
    return $output;
}
echo tweets();
Tags




blog comments powered by Disqus
search blog
random posts
  • Learn-iPhone-iPad-Web-Development
  • Secrets-JavaScript-Ninja-John-Resig
categories & tags
about hb stone

I'm a Front-End Engineer at Yahoo! working on the Mail and Messenger teams. I blog about web design and development topics including accessibility, usability, performance, and developing HTML / CSS / JavaScript applications on Appcelerator Titanium and Adobe AIR.

If you're a web developer, you might enjoy Jelo, my JavaScript library.

@hbstone follows:
@hbstone tweets:
  • Learn-iPhone-iPad-Web-Development
  • Information-Architecture-World-Wide-Web
copyright

All original work on this site is covered by a Creative Commons Attribution 3.0 license unless otherwise specified.

You may share or use any code or images from this site in any manner, for free, so long as reasonable effort has been made to give credit where due.

The views expressed in the posts and comments on this blog do not necessarily reflect the views of Yahoo!