Feed Creator 2.2

Feed Creator 2.2 is now available.

Feed Creator converts web pages into RSS feeds, and can merge and filter existing feeds.

What’s new?

Along with a number of bug fixes (see changelog at the end), there are some new features in this version.

We’ve also updated the documentation.

Image selector

It’s now possible to extract an image URL for each feed item.

The image URL will be used in three places in the feed output: inserted into a <media:content> element, an <atom:link rel=”enclosure”> element, and a regular <img> element inside a <content:encoded> element.

In the JSON output, the image URL will be included in a property with the name “image”.

Query string cleaner

You can now keep or remove query string parameters from item URLs.

The query string in a URL appears after the question mark symbol, e.g.

http://example.org/article?id=879&session=19382

The URL above has two query string parameters, named ‘id’ and ‘session’. On some sites, query string parameters identify content, and should be preserved. On others, they are used for tracking and can be stripped.

We recommend stripping non-essential query string parameters because they can affect whether feed items are treated as new or not by your feed reader.

The possible values you can use in this new field are:

  • 1 = preserve all (default)
  • 0 = remove all
  • param1,param2 = remove all except param1 and param2

In the example URL above, the ‘id’ parameter identifies the article and should be preserved, but the session parameter is nonessential.

The site might generated a new session ID for its links next time Feed Creator fetches the page, which might result in the same feed items now being treated as new by a feed reader because the URLs now look different from before.

To prevent that happening, we can tell Feed Creator to only preserve the ‘id’ parameter by entering ‘id’ in this field.

Item guid

A guid is an identifier that’s usually used by feed readers to determine if a feed item is new or not. It’s not required by the RSS spec, but some feed readers might want it included.

By default, the guid is not included when you generated a feed with Feed Creator.

If you’d like it included, you can now tell Feed Creator to generate an ID based on each item’s url, title or both.

If the guid is omitted, most feed readers will use the item URL to determine if a feed item is new or not.

Improved CSS support

We now use Symfony’s css-selector to convert CSS into XPath. This allows you to use more CSS selectors than before.

In addition to accepting multiple selectors (comma-separated) in the main item selector field, in version 2.2 Feed Creator accepts multiple selectors in all item fields (e.g. item title, item URL). This comes in useful when the source page uses different HTML structures to hold the items you need. For example, a news site might have the top news item appear larger than other the other news items on the page, marked up with different HTML.

Let’s take the following HTML as an example. It contains 4 news items:

<div id="top-story">
  <h1>Big jump in vaccine supply is coming soon</h1>
  <img src="https://cdn.example.org/vaccine.jpg">
  <a href="/big-jump-in-vaccine.html">Read more</a>
</div>

<div class="more-stories">
  <h2><a href="/first-hearing.html">
    First hearing on Capitol riot
  </a></h2>
  <h2><a href="/white-house-promises.html">
    White House promises vaccine help
  </a></h2>
  <h2><a href="/leaders-of-texas-grid.html">
    Leaders of Texas’ grid operator resign
  </a></h2>
</div>

To select all 4 items in Feed Creator, you can enter the following into the Item selector (CSS) field:

#top-story, .more-stories h2

And in the Item title selector (CSS) field:

h1, a

Note that the main item selector can return multiple elements (4 in our example). The selectors you enter in the other item fields (item title, description, date, image, or URL) might match more than one element, but only the first matching element (based on the position in the source HTML) is used.

Feedback

Please try out this new version and let us know what you think.

Existing customers can download or upgrade from the customer page.

Changelog

Feed Creator 2.2.1 (2021-03-06)

  • Bug fix: Warning when running on PHP 8 and using cURL for HTTP handling (fixed in RollingCurl.php)
  • HTML5-PHP library updated to version 2.7.4
  • SimplePie library updated to version 1.5.6

Feed Creator 2.2 (2021-02-26)

  • Allow multiple comma-separated selectors in item and item_* parameters (useful when items are spread across different HTML structures)
  • Use Symfony CSS selector library – allows for more specific element targeting, e.g. a[title], p:nth-child(2), img[src*=”large”], see https://developer.mozilla.org/docs/Web/CSS/Attribute_selectors
  • New item_image parameter to select an image for an item (if available)
  • Added support for attribute selector (@attr) to the item_url parameter
  • New guid parameter to specify if and how guid element should be generated: ‘0’=ignore (default), ‘url_title’=URL+title ‘url’=URL, ‘title’=title
  • New proxy parameter to name the proxy server to be used for the request (set up in config file)
  • New keep_qs_params parameter for cleaning the query string in item URLs: ‘1’=keep all (default), ‘0’=strip all, or a comma-separated list of field names to preserve (e.g., ‘id’ or ‘id,cat’)
  • Added Feed Control – https://feedcontrol.fivefilters.org – as a subscribe option
  • Added BazQux Reader – https://bazqux.com – as a subscribe option
  • Bug fix in mergefeeds.php: Item titles and URLs (and feed title and description) no longer double-encoded when they contain characters that need encoding
  • PHP 8 compatible
  • Other minor improvements