PxO Ink

Twitter RSS Feed

Written By: PxO Ink on April 18th, 2012.

Twitter provides a few methods for accessing a feed of a specific user's tweets. In this article, we'll be using PHP in conjunction with the Twitter timeline. You can read more about how the timeline works here, and API information about this specific method here. At the end of the article, a working application will be provided based on the instructions provided herein.

This system uses cURL, DOMDocument, regex and filesystem functions to acquire, parse and assemble the feed into an htm file. Knowledge of these various tools are recommended, but not necessary to understand this article and make use of the system.

Note: Twitter has a limit on the number of requests one can make in terms of RSS. As such, cron should be investigated in order for this system to work consistently.

Let's begin!

First, there are a few variables that need to be declared, which will house information such as the feed URI and some regex to handle proper linking and whatnot:


$document	=	new DOMDocument();
$twitter	=	"http://twitter.com/statuses/user_timeline/";
$limit		=	5;
$regexTUsers	=	'/@([A-Z0-9_]+)/i';
$regexHashes	=	'/#([A-Z0-9_]+)/i';
$regexLinks	=	'/((http|https)\:\/\/[A-Z0-9\-\.]+\.[A-Z]{2,6}(\/\S*)?)/i';

Let's go through each variable here individually:

  • The DOMDocument library will be used to parse the feed.

  • The feed location for Twitter

  • The number of tweets that should be returned.

  • Regex to identify user accounts.

  • Regex to identify hash search tags.

  • Regex to identify links.

If the regex gets confusing, take a look at this cheat sheet.

Now that the basic variables are declared, cURL can be used to get the feed content. You can read more about cURL here. The following represents the magic of cURL:


$ch	=	curl_init();
curl_setopt($ch, CURLOPT_URL, $twitter . $user . ".rss?count=$limit");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

Here cURL is initialized, and various options are set. As can be seen above, the feed location is built and set as CURLOPT_URL. The feed and user variables are concatenated for clarity.


$feed	=	curl_exec($ch);

curl_close($ch);

cURL is then used to get the feed content and the connection is closed.


if (!$feed || !$document -> loadXML($feed)) return;

Once the feed is actually acquired, it can be loaded as an XML document. If the feed isn't acquired, or the document cannot be loaded as XML, the application is halted.

Now comes the real meat of the application. This block of code essentially traverses the feed (June 14th, 2012:)


foreach ($document -> getElementsByTagName('item') as $node) {
	if (!empty($node -> getElementsByTagName('error') -> item(0) -> nodeValue))	{
		file_put_contents('crontweets.error.log', date('m/d/Y - h:i:s A T') . $node -> getElementsByTagName('error') -> item(0) -> nodeValue . '\n\r', FILE_APPEND);
		
		$errors	=	true;
	} else {
		$tweet[]	=	date('D, M dS, Y - h:i:s A T', strtotime($node -> getElementsByTagName('pubDate') -> item(0) -> nodeValue)) . '<br />';
		$tweet[]	=	htmlentities(end($tweet) . $node -> getElementsByTagName('title') -> item(0) -> nodeValue);
		$tweet[]	=	html_entity_decode(preg_replace('/&Acirc;&nbsp;/', ' ', end($tweet)));
		$tweet[]	=	preg_replace($regexLinks, '<a href="$1">$1</a>', end($tweet));
		$tweet[]	=	preg_replace($regexUser, '<a href="' . $node -> getElementsByTagName('link') -> item(0) -> nodeValue . '" target="_blank">$1</a>', end($tweet));
		$tweet[]	=	preg_replace($regexTUsers, '<a href="http://twitter.com/$1" target="_blank">@$1</a>', end($tweet));
		$tweet[]	=	preg_replace($regexHashes, '<a href="http://search.twitter.com/search?q=$1" target="_blank">#$1</a>', end($tweet));
		$tweets	.=	'<li>' . end($tweet) . '</li>';		
	}
}

Let's dissect each aspect:


foreach ($document -> getElementsByTagName('item') as $node) {
	/*...*/
}

This is a loop which essentially states, "for each "item" within the document." The feed is full of "items" in this form:


  <item>
    <title>UserAccount: Title of the Tweet</title>
    <description>UserAccount: Description of the Tweet</description>
    <pubDate>Wed, 18 Apr 2012 00:00:00 +0000</pubDate>
    <guid>http://twitter.com/UserAccount/statuses/000000000000000000</guid>
    <link>http://twitter.com/UserAccount/statuses/000000000000000000</link>
    <twitter:source/>
    <twitter:place/>
  </item>

This application is concerned with the title, date and link of the tweet.


	if (!empty($node -> getElementsByTagName('error') -> item(0) -> nodeValue))	{
		/*...*/
	} else {
		/*...*/
	}

Before any parsing of the feed begins, the first thing that is checked is whether or not there's an error being reported by Twitter.


file_put_contents('crontweets.error.log', date('m/d/Y - h:i:s A T') . $node -> getElementsByTagName('error') -> item(0) -> nodeValue . '\n\r', FILE_APPEND);
		
$errors	=	true;

If there is an error, it's best to output that to a separate log file and set a flag that an error has occurred.

Assuming no error occurs, each level of the tweet is dissected and rebuilt using HTML. In order to make this easy to modify, the string will be built using arrays.


$tweet[]	=	date('D, M dS, Y - h:i:s A T', strtotime($node -> getElementsByTagName('pubDate') -> item(0) -> nodeValue)) . '<br />';

This converts the publication date into something a bit more readable.


$tweet[]	=	htmlentities(end($tweet) . $node -> getElementsByTagName('title') -> item(0) -> nodeValue);	

This captures the title of the tweet and encodes it (this is very important!)


$tweet[]	=	html_entity_decode(preg_replace('/&Acirc;&nbsp;/', ' ', end($tweet)));

Twitter has a strange fascination with using &#160; instead of &nbsp; to display spaces (spaces would work too, Twitter.) This basically looks like 'Â ' when encoded. So, we're encoding the text and replacing these characters with actual spaces, then decoding it.

"Why go through so much trouble when they look just like regular spaces?" Well, they're not considered whitespace by PHP, and so regular expressions against them wouldn't really work (thus our hyperlinks would not render correctly.)


$tweet[]	=	preg_replace($regexLinks, '<a href="$1">$1</a>', end($tweet));

Regex is used to find any written URI and transform it into a hyperlink.


$tweet[]	=	preg_replace($regexUser, '<a href="' . $node -> getElementsByTagName('link') -> item(0) -> nodeValue . '" target="_blank">$1</a>', end($tweet));

Regex is then used to find any mention of the user's account, and provide a link back to this tweet.


$tweet[]	=	preg_replace($regexTUsers, '<a href="http://twitter.com/$1" target="_blank">@$1</a>', end($tweet));

Regex is used again to find any references to another twitter user, and provide a link back to their account page.


$tweet[]	=	preg_replace($regexHashes, '<a href="http://search.twitter.com/search?q=$1" target="_blank">#$1</a>', end($tweet));

Regex is now used to locate any hashtag reference and provide a proper search link of that hashtag.


$tweets	.=	'<li>' . end($tweet) . '</li>';

The final step of parsing takes the fully built string and appends it to a single variable, which will be output to the htm file. Note that each tweet is currently being encapsulated in an HTML list element.


if (!$this -> errors)	file_put_contents(dirname(__FILE__) . '/../tweets.htm', $tweets);

If there are no additional errors, the file is then saved to an htm file which can be easily included using PHP. You can read more about that here.

As promised, you can download a pre-packaged application for free, here. Thank you for reading and if you have any questions, comments, suggestions or concerns, you can send an email.