me, 2.0: jose nazario
beauty and the street
doing more with RSS
as promised, something tech related.
in a nutshell, the above is part of the visualization of my RSS
clustering work. RSS clustering is a technique i've been using for
over a year now (i didn't invent it, but i did implement it kind
of naively) as a way of doing two main things:
to accomplish this, you gather a pile of related feeds and break it
into terms, find the most highly linked terms, and group around them.
i'm doing this naively, as i said above, but it's been a gateway to
more mature CS and methods than i would have expected.
- reducing the redundancy of the information within the feeds
- discovering what's interesting
the RSS reader model simply cannot scale to hundreds of feeds (which
i find myself using every day). you simply cannot parse that
much information, even if the fetching has been automated, without
suffering from overload and the eventual numbness. at that point,
you're essentially back to square one (minus the effort of finding
sites to visit and finding the new items). i have much more to do
with my days than stare at my newsreader all day.
what clustering does is reduce the redundancy inherent in the data
stream. in the case of world news from dozens of outlets, they'll
often be talking about the same topics. conflicts, poltics, science,
events, and the like. and they'll often be using the same terms to
do this. so, if you can find the overlap and reduce the visibility,
you've streamlined the process some.
now take this a step further. you know how many hits for any term or
topic you have, so you can rank them by popularity or by how linked
any of these topics are within your data set. so, you can order the
presentation of the data using that information and make your surfing
more efficient. have a look at what the hot topic of the hour is,
using the inherent operations of the world news organizations to
act as a collaborative filter.
like i said, i've been doing this for over a year now with world
news. it's effective, scales well, and provides more information
beyond the headlines and blurbs syndicated. and now i'm sharing some
of the gory details.
read more ...
next Saturday, Sep 04, 2004 @ 04:55pm |
previous Thursday, Sep 02, 2004 @ 07:21am
Last modified: Friday, Sep 03, 2004 @ 07:36am
copyright © 2002-2005 jose nazario, all rights reserved.