Reading an RSS Feed






Reading an RSS Feed

Because RSS is an XML application, any XML library from SAX through dom4j is capable of parsing RSS documents. However, due to the sheer number of RSS versions, an RSS library can be especially useful if you need to parse RSS feeds. (For this same reason, data binding is a poor technique to use with RSS.)

Feed Input with ROME

Just as there are SyndFeedOutput and WireFeedOutput classes to output syndicated feeds, ROME includes classes called SyndFeedInput and WireFeedInput to input feeds. These input classes can read feeds from a variety of sources:

  • A java.io.File object

  • A java.io.Reader object

  • An org.xml.sax.InputSource object

  • An org.w3c.dom.Document object

  • An org.jdom.Document object

When parsing a feed, WireFeedInput determines what type of feed it is by looking at elements, attributes, and namespaces defined in the feed. SyndFeedInput delegates parsing to WireFeedInput and then converts the resulting WireFeed object to a SyndFeed object.

Building a Simple Aggregator

To demonstrate these feed reading capabilities, let's build a simple command-line RSS and Atom aggregator. Our aggregator will be passed a list of feed URLs on the command line and output the entries in those feeds in a single list. The user can then select a single entry from the list to see the title, description, and link for that entry. Because we want this aggregator to support both RSS and Atom, we'll use SyndFeedInput and the classes in the com.sun.syndication.feeds.synd package. Figure contains the skeleton code for our SimpleAggregator class.

Framework aggregator code

package javaxml3;

import java.util.List;

import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.io.SyndFeedInput;

public class SimpleAggregator {

    private SyndFeedInput feedInput;

    public SimpleAggregator(  ) {
        feedInput = new SyndFeedInput(  );
    }

    private void run(String[] args) {
        System.out.println("Welcome to the Simple Aggregator.");

        List allEntries = loadFeeds(args);

        System.out.println("Done loading feeds.");
        System.out.println(  );

        System.out.println("Please choose an entry below:");

        outputMenu(allEntries);

        int choice = acceptUserChoice(allEntries.size(  ));
        if (choice > 0) {
            System.out.println("You chose entry #:" + choice);
            SyndEntry entry = (SyndEntry) allEntries.get(choice);
            outputEntry(entry);
        }

    }

    public static void main(String[] args) {
        if (args.length == 0) {
            System.err.println("Usage: java javaxml3.SimpleAggregator [URLs]");
            return;
        }

        SimpleAggregator aggregator = new SimpleAggregator(  );
        aggregator.run(args);

    }

}

To load the feeds, we'll loop through the command-line arguments and build a SyndFeed object for each URL. To keep our output simple, we're only going to add five items from each feed to our entry menu.

private List loadFeeds(String[] feedURLs) {
   List allEntries = new ArrayList(  );
   for (int i = 0; i < feedURLs.length; i++) {
       String feedURL = feedURLs[i];
       System.out.println("Loading feed from: " + feedURL + ".....");
       SyndFeed feed = null;
       try {
           feed = feedInput.build(new InputSource(feedURL));
       } catch (IllegalArgumentException e) {
           System.err.println("Unable to parse feed from: " + feedURL);
           e.printStackTrace(  );
       } catch (FeedException e) {
           System.err.println("Unable to parse feed from: " + feedURL);
           e.printStackTrace(  );
       }
       System.out.println("Found a feed of type " + feed.getFeedType(  ));
       System.out.println("Feed title: " + feed.getTitle(  ));

       List entryList = feed.getEntries(  );
       if (!entryList.isEmpty(  )) {
           int entryListLength = Math.min(entryList.size(  ), 5);
           entryList = entryList.subList(0, entryListLength);
           allEntries.addAll(entryList);
       }
   }
   return allEntries;
}

Our menu contains an index and the title for each feed entry:

private void outputMenu(List allEntries) {
    for (int i = 0; i < allEntries.size(  ); i++) {
        SyndEntry entry = (SyndEntry) allEntries.get(i);
        System.out.println("#" + (i + 1) + ": " + entry.getTitle(  ));
    }
}

Finally, the methods to accept the user's selection and output are fairly simple:

private int acceptUserChoice(int menuLength) {
    BufferedReader reader = new BufferedReader(new InputStreamReader(
            System.in));
    System.out.print(">");
    String choice = null;
    try {
        choice = reader.readLine(  );
    } catch (IOException e) {
        System.err.println("Could not read choice.");
        return 0;
    }
    try {
        int choiceInt = Integer.parseInt(choice);
        if (choiceInt > menuLength)
            return 0;
        else
            return choiceInt;
    } catch (NumberFormatException e) {
            return 0;
    }
}
private void outputEntry(SyndEntry entry) {
    System.out.println("Title: " + entry.getTitle(  ));
    System.out.println("Description: " + entry.getDescription(  ).getValue(  ));
    System.out.println("Link: " + entry.getLink(  ));
}

Here's the output of this class when I passed the URLs for Slashdot's RSS feed and an RSS feed from BBC News.

Welcome to the Simple Aggregator.
Loading feed from: http://rss.slashdot.org/Slashdot/slashdot.....
Found a feed of type rss_1.0
Feed title: Slashdot
Loading feed from:
http://news.bbc.co.uk/rss/newsonline_world_edition/front_page/rss091.xml.....
Found a feed of type rss_2.0
Feed title: BBC News | News Front Page | World Edition
Done loading feeds.

Please choose an entry below:
#1: Stem Cells - The Hope and the Hype
#2: 50th Anniversary of the First Hard Drive
#3: Knock Some Commands Into Your Laptop
#4: The End of E3?
#5: Fun Things To Do With Your Honeypot System
#6: Israel halts fire for Qana probe
#7: Polls close in Congo elections
#8: Rally challenges Mexico poll
#9: Bosnia Muslim sentence contested
#10: Seychelles head wins re-election
>2
You chose entry #:2
Title: 50th Anniversary of the First Hard Drive
Description: ennuiner writes "Over at Newsweek Steven Levy has a column
commemorating IBM's introduction of the first hard drive 50 years ago.
Link: http://rss.slashdot.org/~r/Slashdot/slashdot/~3/7395897/article.pl

As you can see, the menu output isn't quite right. We want our aggregator's menu to combine entries from both feeds in date order, not all the Slashdot entries followed by the BBC News entries. To sort the entries, we use an implementation of the java.util.Comparator interface to compare entries based on the result of the getPublishedDate( ) method. We have to be careful to deal with cases where getPublishedDate( ) returns null:

private void sort(List entryList) {
    Collections.sort(entryList, new Comparator(  ) {

        public int compare(Object arg0, Object arg1) {
            SyndEntry entry0 = (SyndEntry) arg0;
            SyndEntry entry1 = (SyndEntry) arg1;
            if (entry0.getPublishedDate(  ) == null)
                return -1;
            if (entry1.getPublishedDate(  ) == null)
                return 1;

            return entry1.getPublishedDate(  ).compareTo(
                    entry0.getPublishedDate(  ));
        }
    });
}

With this code in place, our aggregator actually produces an aggregated menu:

Welcome to the Simple Aggregator.
Loading feed from: http://rss.slashdot.org/Slashdot/slashdot.....
Found a feed of type rss_1.0
Feed title: Slashdot
Loading feed from:
http://news.bbc.co.uk/rss/newsonline_world_edition/front_page/rss091.xml.....
Found a feed of type rss_2.0
Feed title: BBC News | News Front Page | World Edition
Done loading feeds.

Please choose an entry below:
#1: Israel halts fire for Qana probe
#2: Seychelles head wins re-election
#3: Stem Cells - The Hope and the Hype
#4: Bosnia Muslim sentence contested
#5: 50th Anniversary of the First Hard Drive
#6: Knock Some Commands Into Your Laptop
#7: Rally challenges Mexico poll
#8: The End of E3?
#9: Fun Things To Do With Your Honeypot System
#10: Polls close in Congo elections
>2
You chose entry #:2
Title: Seychelles head wins re-election
Description: The incumbent president of Seychelles, James Michel, wins another five
years in office in presidential elections.
Link: http://news.bbc.co.uk/go/rss/-/2/hi/africa/5230012.stm



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows