diff --git a/rss-cli.html b/rss-cli.html new file mode 100644 index 0000000..57d170e --- /dev/null +++ b/rss-cli.html @@ -0,0 +1,211 @@ +
+
+

So if you actually do read this site often, you may have noticed that there +is now an RSS feed. Its on the main posts page, up at the top right. RSS is a +very interesting technology. It was designed with the intent, it seems, to +connect the whole internet in one nice syndication protocol that was easy to +understand and use. And it really delivered on that! It just seems to have not +caught on as much as once thought. However, I still use +newsboat for most of my youtube and other feeds, and it works great. +Despite being 'dead', most things support it (or you can find a +tool to make it +work).

+ +

With that said, it seems like it would be great to automate stuff with it, +unix style. Pipes, bash scripting, the whole deal. However, I didn't really +find anything that fit my needs. I just wanted a light, simple to use program +that could extract things from an rss feed and spit it out, to be further +processed by something like awk or something. Alas, with my searching I found +nothing. So pulled up a tmux session, put on some music at full volume, and one +weekend later we now have rss-cli!

+
+
+
+ +
This site hosts a +valid rss feed.
+
+
+
+ +
+
+
+ + +
You can view the code I'm describing yourself on github + (above) and + this site.
+
+
+
+

How does it work?

+ +

rss-cli uses the very fast c++ library +rapidxml to parse the RSS feed. Performance tends to be around 3ms total +execution time for very large RSS feeds (~30 items) on my i7-9750H. I was +getting about 10ms on a raspberry pi 4 for the same feed.

+ +

rss-cli will parse the rss file, which is identified by a URI. A URI is +used, because the program uses libcurl to fetch rss feeds off the internet, +however file:///some/rss/feed.rss is also valid for local files. Once the file +is grabbed, it is parsed by rapidxml, then kept in memory. When a specific +attribute is needed, it is fetched as needed. This lazy-loading approach keeps +execution times low, as often you will not need the entire feed, you will +probably only be extracting key bits of information for your next program to +parse.

+ +

All of the meat of rss-cli is in the rss_utils namespace. I placed this +here, along with an rss_utils::rss object for interacting with the rss feed, so +that moving rss.cpp and rss.hpp to your own project can be as easy as possible. +rss_utils also contains a rss_utils::item, which represenets the tags. +These item objects are stored in a std::vector, so that your program can easily +iterate through them.

+ +

Both rss_utils::rss and rss_utils::item contain clone functions, the big 3, +and accessor functions for all of the possible associated elements. For +example, if you want to access an rss feed's title, you would call:

+ +
std::string rss_utils::rss::getTitle() const
+ +

All responses are given as std::string, to allow for the widest +compatability possible. Each time one of these functions are called, it will +search the document for attribute, and return an empty string +(std::string("")) if nothing is found. Neither of the classes ever throw +exceptions. rss_utils::rss also provides a isOk() function for checking if the +rss feed was valid. If isOk() returns false, all accessor functions will return +empty strings. When attempting to get items while isOk() is false, an empty +std::vector<rss_utils::items> will be returned

+
+
+ +
+
+

How do you use rss-cli?

+ +

rss-cli provides the --help flag to display all of the optiosn it will +accept. There are alot of options, but this is because each option corresponds +to a field in the RSS 2.0 Spec. Here is a full version of the help menu (as of +7-26-21):

+ +
Usage: rss-cli [-u FEED_URI] [CHANNEL FLAGS] [-i ITEM_INDEX] [ITEM FLAGS] + Options: + Required Options: + [-u, --uri] URI URI of the rss stream + + Channel information: + [-t, --title] Get title of channel + [-l, --link] Get link to channel + [-d, --description] Get description of channel + [-L, --language] Get language code of channel + [-m, --webmaster] Get webMaster's email + [-c, --copyright] Get copyright + [-p, --pubdate] Get publishing date + [-e, --managingeditor] Get managing editor + [-g, --generator] Get generator of this feed + [-o, --docs] Get link to RSS documentation + [-w, --ttl] Get ttl, time that channel can be + cached before being updated + [-b, --builddate] Get last time the channel's + content changed + [-Q, --imageurl] Get channel image URL + [-I, --imagetitle] Get image title, same as ALT in html + [-E, --imagelink] Get link to site, image will act as a link + [-W, --imagewidth] Get width of image + [-H, --imageheight] Get height of image + [-D, --clouddomain] Get domain of feed update service + [-P, --cloudport] Get port of feed update service + [-A, --cloudpath] Get path to access for feed update service + [-R, --cloudregister] Get register procedure for feed update service + [-O, --cloudprotocol] Get protocol feed update service uses + [-i, --item] INDEX Provide index of item to display + If no index is provided, assume the first + item in the feed. All following flags will + be parsed as item options, till another + item is provided + + Item options: + [-t, --title] Get title of item + [-l, --link] Get link + [-d, --description] Get description + [-a, --author] Get author + [-C, --category] Get category list + [-f, --comments] Get link to comments + [-G, --guid] Get GUID + [-p, --pubdate] Get publishing date + [-s, --source] Get source of item + [-U, --enclosureurl] Get enclosure URL + [-T, --enclosuretype] Get enclosure MIME type + [-K, --enclosurelength]Get enclosure length, in bytes + + General options: + [-h, --help] Show this message + + For more information, refer to the RSS 2.0 documentation + https://validator.w3.org/feed/docs/rss2.html +
+ +

Breaking this down, we first need the -u flag to say where to get the RSS +feed. Once we have that, we can pass flags to grab everything we need. The +Channel information flags have to be passed before the item options. +Once the -i flag has been passed, all following options must be item options, +and will be applied to that item. If -h is passed anywhere, the program will +display the help message and quit.

+ +

The slowest part of the program will be fetching the file using libcurl, +therefore if you plan to do several operations on the same feed, I recommend +downloading the file first and using file:// to tell rss-cli where the file +is.

+ +

All options are also displayed in the order they are listed in --help. This +means that even if you run rss-cli with:

+ +
rss-cli -u file:///my/local/rss.rss --description --link --title
+ +

The output will still be:

+ +
RSS Feed Title + Feed link + Feed description +
+ +

This makes output predictable and easy for other programs to understand. If +a empty line is encountered, then it can be assumed the requested tag is not in +the feed. This same concept applies to each item.

+
+
+ +
+
+

Possible use cases

+ +
Get quick headlines in bashrc
+ +

Grab headlines from BBC and show the top three in your bash rc

+ +
+ echo $(rss-cli -u http://feeds.bbci.co.uk/news/world/us_and_canada/rss.xml \ + -i0 -td -i1 -td -i2 -td) +
+ +
Get weather and place into a file
+ +

Grab todays weather and put it in a file, for logging

+ +
+ rss-cli -u http://www.rssweather.com/zipcode/10001/rss.php -i -d >> + weather/$(date).txt +
+ +
Get new posts from archive.org and automatically download them
+ +

This example uses opensource_audio from archive.org, this could be put on a +cronjob

+ +
+ wget $(rss-cli -u https://archive.org/services/collection-rss.php?collection=opensource_audio -i0 --enclosureurl) -P ~/archive_audio +
+
+