Feed Creator – FiveFilters.org https://www.fivefilters.org Web articles made accessible Mon, 10 May 2021 10:35:25 +0000 en-US hourly 1 https://wordpress.org/?v=5.7.1 https://www.fivefilters.org/wp-content/uploads/2020/04/cropped-site-logo-round-32x32.png Feed Creator – FiveFilters.org https://www.fivefilters.org 32 32 Google News RSS feeds https://www.fivefilters.org/2021/google-news-rss-feeds/ Thu, 06 May 2021 22:21:19 +0000 https://www.fivefilters.org/?p=3885 It's possible to get a variety of RSS feeds from Google News, and these all come from Google itself, so they don't have to be generated with third-party tools like our Feed Creator.

The post Google News RSS feeds appeared first on FiveFilters.org.

]]>
It’s possible to get a variety of RSS feeds from Google News, and they all come from Google itself, so they don’t have to be generated with third-party tools like our Feed Creator.

In this post we’ll show you how to get RSS feeds for top stories, topics, search results, and site-specific feeds. Each section will show you how to use Google News to get the news items you want, and then how to get those same results as an RSS feed.

Top stories

Google’s top stories are at: https://news.google.com/topstories

When you load that page, Google will set a country for you (e.g. US or UK) and show you the top stories for the audience of that country. The URL will change to reflect the language and country.

News for a US audience: https://news.google.com/topstories?hl=en-US&gl=US&ceid=US:en

If you’d like the top stories for a different region or in a different language, find ‘Language & region’ in the left sidebar (towards the bottom) and click it. You’ll then be able to select a different language and region.

Google News language and region
Google News language & region selection

After making your selection, the news items will update and if you look at your browser’s address bar, you’ll notice the URL has changed to refelect your selection. Google uses language codes (lowercase letters) and country codes (uppercase letters) in its URLs:

RSS feeds for top stories

To get RSS feeds for the top stories you want, simply replace ‘topstories‘ in the URL with ‘rss‘:

You can make this URL replacement in your browser’s address bar to view the RSS source.

Google News website and RSS feed side by side.

Copy and paste the RSS feed URL into your favourite news reader to subscribe to it and receive updates.

Topics

Google also provides news for different topics:

You can select from a set of main topics using the links in the left sidebar, or use the topic search to find trending topics or search for topics.

Google News topics
Google News topic selection

RSS feeds for topics

To get RSS feeds for topics you want, replace ‘/topics‘ in the URL with ‘/rss/topics‘:

  • [RSS] Technology
    https://news.google.com/rss/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGRqTVhZU0FtVnVHZ0pIUWlnQVAB?hl=en-US&gl=US&ceid=US%3Aen
  • [RSS] Health
    https://news.google.com/rss/topics/CAAqIQgKIhtDQkFTRGdvSUwyMHZNR3QwTlRFU0FtVnVLQUFQAQ?hl=en-US&gl=US&ceid=US%3Aen
  • [RSS] Entertainment
    https://news.google.com/rss/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNREpxYW5RU0FtVnVHZ0pWVXlnQVAB?hl=en-US&gl=US&ceid=US%3Aen

You can change the language and region for topics too in the same way as we did for top stories (either using the Google News interface or by modifying the URL and changing the language and country codes).

Advanced search

Google News also lets you get news items based on your search criteria. Click the arrow at the end of the search field to open the advanced search form.

Google News advanced search
Google News advanced search

Examples:

RSS feeds for advanced search

To get RSS feeds for search results, replace ‘/search‘ in the URL with ‘/rss/search‘:

News from a particular site

What if we’re only interested in news from a particular site? If it’s indexed by Google News we can limit results to that site using the same advanced search fields covered in the previous section.

Not all blogs and news sites indexed by the main Google search engine are available in Google News.

Here’s how you’d get recent news items indexed by Google News from The Grayzone website:

Google News advanced search with Website field filled in.
Telling Google News to return news items posted on The Grayzone in the last week

This will result in the following search query: site:thegrayzone.com when:7d

The Grayzone has its own feed, so unless we want to narrow the search down further, we’d be better off using the official feed. And with Google’s hostile attitude to independent news sites we’d advise against relying on Google News feeds for non-corporate news sources whenever possible.

But there are sites indexed by Google News that don’t publish their own feeds. Using Google News’ feed output, you can get RSS feeds for these sites. Reuters is one example. You can get reuters.com news items published in the last hour using this Google News query: site:reuters.com when:1h

(For sites that don’t publish feeds and aren’t on Google News, our Feed Creator service might help.)

It’s also possible to include a path segment in the site: operator to limit results to a specific category. For example, Reuters publishes its technology news at https://www.reuters.com/technology/, so you can limit Google News results to news items from this category by using: site:reuters.com/technology

RSS feeds for specific sites

We’re still using the search endpoint here, so like before, replace ‘/search‘ in the URL with ‘/rss/search‘:

News from a number of specific sites

One useful feature of Google News is being able to combine searches. We can, for example, tell Google News to return news items from two or three specific sites in a single feed.

After you run a search, you can edit the query by hand in the search field to use additional operators.

Let’s say we want news items from two sites: The Grayzone and MintPress News published in the last 7 days, here’s the query to use:

site:thegrayzone.com OR site:mintpressnews.com when:7d

Want another site, let’s say Fairness and Accuracy in Reporting, add that with another ‘OR’:

site:thegrayzone.com OR site:mintpressnews.com OR site:fair.org when:7d
Google News search results for The Grayzone and MintPress News.

RSS feeds for specific sites

As before, we’re still using the search endpoint, so replace ‘/search‘ in the URL with ‘/rss/search‘:

So that’s how you get Google News RSS feeds. If you have any questions, feedback or suggestions, please let us know on our forum.

The post Google News RSS feeds appeared first on FiveFilters.org.

]]>
How to use proxy servers with Feed Creator and Full-Text RSS https://www.fivefilters.org/2021/proxy-servers-in-feed-creator-and-full-text-rss/ Fri, 30 Apr 2021 12:22:38 +0000 https://www.fivefilters.org/?p=3855 In this post we'll look at how to configure the self-hosted versions of our Feed Creator and Full-Text RSS software to use proxy servers.

The post How to use proxy servers with Feed Creator and Full-Text RSS appeared first on FiveFilters.org.

]]>
In an earlier post we looked at how routing requests through proxy servers could help with content retrieval for some sites. We also showed you how to enable proxy use in our Feed Control service for feeds that need it.

In this post we’ll look at how to configure the self-hosted versions of our Feed Creator and Full-Text RSS software to use proxy servers.

Storm Proxies as proxy provider

You can use any proxy service provider you like, but in this guide we’ll be using Storm Proxies, specifically its Dedicated Rotating Proxies. That’s what we use in our Feed Control service and it’s worked well for us for the feeds which have needed proxy routing. (If you encounter feeds that need a more specialised solution, you can still follow these steps but enter the details associated with whatever service you choose to use.)

With this particular service Storm Proxies requires you use IP authentication, so the IP address belonging to your server hosting our software needs to be registered in your Storm Proxies account.

Storm Proxies client area

If you’ve set up our software on a VPS server, you’ll be able to find the IP address of the VPS server in your account. That’s what you’d enter in the Authorized IPs field.

Depending on the package you purchase from Storm Proxies, you will be able to enter between 1-5 IP addresses in this field. The cheapest package ($14/month) only allows 1 IP address and 10 simultaneous connections. If you’re running our software on a single VPS and can ensure the feeds won’t all be requested simultaneously in a way that exceeds the 10 simultaneous connections limit, this package will work fine. Otherwise Storm Proxies offer bigger packages that increase the number of simultaneous connections and IP addresses.

Proxy configuration in Full-Text RSS/Feed Creator

Both Full-Text RSS and Feed Creator allow you to configure how proxy servers are used with each application, these include:

  1. Never use proxy
  2. Always use proxy
  3. Only use proxy when &proxy request parameter is used

A proxy server should only be needed in situations where a direct connection doesn’t work (see our previous post for more details), so we recommend option 3 above. To enable that mode, you’ll want to edit the config file for Full-Text RSS/Feed Creator.

Best practice for both applications is to set up a custom config file (instructions in the README.txt file distributed with each application) so that future updates to the software don’t overwrite your settings.

There are 3 configuration options in the config file that you’ll want to edit. Let’s go through them one by one.

Step 1: Enter proxy server IP and port

After you create your Storm Proxies account, you’ll be given an IP address and port number. You’ll actually be given a set of 3 IPs: Main gateway, 3-minute gateway, 15-minute gateway.

Proxy gateways provided by Storm Proxies

We recommend using the main gateway IP in our software.

$options->proxy_servers = ['stormproxies' => ['host' => 'x.x.x.x:xxxxx']];

Make sure to edit the above and replace ‘x.x.x.x:xxxxx’ with the IP address and port (number after the colon) for the main gateway displayed in your Storm Proxies account.

If you’re using a different proxy service that requires username and password for authentication, you can add that with the ‘auth’ key, e.g.: ['host'=>'x.x.x.x:xxxxx', auth'=>'user:pass']

Step 2: Disable proxy use by default

We don’t want the proxy server to be used for every request, so let’s disable it by default.

$options->proxy = false;

Step 3: Allow proxy use per request

But we do want to be able to indicate that the proxy server should be used for certain requests, so we want to enable proxy override.

$options->allow_proxy_override = true;

Using the proxy service

With the above changes, you won’t notice any difference when using Feed Creator or Full-Text RSS, and won’t see an option for choosing a proxy server from the interface of either application.

To indicate that the proxy server you entered should be used, you will need to change the feed URL generated by Full-Text RSS or Feed Creator and add &proxy=stormproxies as a request parameters.

Full-Text RSS example

.../full-text-rss/makefulltextfeed.php?url=example.org%2Ffeed&proxy=stormproxies

Feed Creator example

.../feed-creator/extract.php?url=example.org...&proxy=stormproxies

Failing requests?

When using a rotating proxy service, the results can be hit and miss because requests are routed through different servers. If something doesn’t load, try refreshing to route through a different server. Feeds are intended to be polled regularly for updates, so the occasional failing request shouldn’t be of concern.

In Full-Text RSS, however, if content cannot be retrieve (due to an extraction, connection, or proxy failure) the feed item will still be returned, but with a message [unable to retrieve full-text content]. When using a rotating proxy, you will probably want to tell Full-Text RSS to exclude items that cannot be retrieved, because a future attempt through a different server might succeed. To do that, add the parameter: &exc=1:

.../full-text-rss/makefulltextfeed.php?url=example.org%2Ffeed&proxy=stormproxies&exc=1

This can also be enabled via the Full-Text RSS interface by selecting ‘remove item from feed’ in the ‘If extraction fails’ field.

More than one proxy service?

To use different proxy services for different feeds, you can enter more in the configuration file and give each one a unique name. Then pass the name of the proxy service that should be used in the proxy request parameter. So some feeds might have &proxy=proxy1 and others &proxy=proxy2.

Here’s how the configuration might look:

$options->proxy_servers = [
  'proxy1' => ['host' => 'x.x.x.x:xxxxx'], 
  'proxy2' => ['host' => 'x.x.x.x:xxxxx', 'auth' => 'user:pass']
];

Testing the proxy service

The best way to test if the proxy service is working is by fetching a page which shows you the IP address making the request. We’ll use myexternalip.com here. If you load that now, it should show you the IP address associated with your current connection.

When using our software without a proxy service, the IP address shown from such a page will be the IP address of the server running Full-Text RSS or Feed Creator. When the same request goes through a proxy service, the IP address shown will be one connected to the proxy service, not your server.

Testing Full-Text RSS

  1. Enter the URL https://myexternalip.com/ into the URL field in Full-Text RSS and hit ‘Create Feed’
  2. You should see the IP address of the server Full-Text RSS is hosted on in the results:
    “My External IP address – [your server IP]”
  3. Edit the Full-Text RSS URL in your browser’s address bar based on the instructions in the previous section, so the final URL will look something like: .../full-text-rss/makefulltextfeed.php?url=myexternalip.com&proxy=stormproxies
  4. Now you should be shown a different IP address:
    “My External IP address – [IP associated with proxy service]”

When testing Full-Text RSS with pages that show you the IP address making the request, be aware that Full-Text RSS only returns content when it can extract what it determines is the main content element on a page. It’s designed for web articles such as news stories and blog posts, so will typically look for clues, such as a series of paragraphs, to help it identify the article body. When used on pages that aren’t structured like a text article, it often won’t find a suitable element and won’t return a result at all. You’ll instead see the message ‘[unable to retrieve full-text content]‘. At the time of writing, it is able to extract from the myexternalip.com page used in the steps above.

Some sites, such https://api.ipify.org and https://ip.seeip.org will simply return the IP as text with zero HTML. To get Full-Text RSS to display these, you should use &debug=rawhtml in the request, e.g.: .../full-text-rss/makefulltextfeed.php?url=api.ipify.org&proxy=stormproxies&debug=rawhtml

You’ll then see the HTTP response from the server, including the IP address at the bottom.

Testing Feed Creator

  1. Enter the URL https://myexternalip.com/ into the URL field in Feed Creator and hit ‘Preview’
  2. You should see the IP address of the server Feed Creator is hosted on in the results:
    “My External IP address – [your server IP]”
  3. Click the RSS Feed button to load the RSS feed in your browser
  4. Edit the URL in your browser’s address bar based on the instructions in the previous section, so the final URL will look something like: .../feed-creator/extract.php?url=myexternalip.com&proxy=stormproxies
  5. Now you should be shown a different IP address:
    “My External IP address – [IP associated with proxy service]”

If you’re using the rotating proxy service from Storm Proxies as described in this guide, the IP address you see connected to the proxy service won’t be the IP address of the gateway you entered in the configuration file. It will be different, and should change on each request (provided you’ve not enabled caching in Full-Text RSS/Feed Creator).

The post How to use proxy servers with Feed Creator and Full-Text RSS appeared first on FiveFilters.org.

]]>
Using proxy servers for content retrieval https://www.fivefilters.org/2021/proxy-server-support/ Mon, 26 Apr 2021 13:35:40 +0000 https://www.fivefilters.org/?p=3822 We've added proxy support to Feed Control in our latest update. This post will explain what it does, why you might need it, and how you can enable it.

The post Using proxy servers for content retrieval appeared first on FiveFilters.org.

]]>
We’ve added proxy support to Feed Control in our latest update. This post will explain what it does, why you might need it, and how you can enable it.

If you’re a user of our self-hosted Full-Text RSS or Feed Creator software, we’ll be covering how you can enable proxy support in those applications in the next post.

What’s a proxy server?

Proxy servers are used to route HTTP requests (e.g. requests for web pages) through different servers.

When you use our hosted applications (Feed Control, Full-Text RSS or Feed Creator) to fetch content from webpages, those requests go out from our servers in Germany (that’s where we host most of our web services). So when fetching content from example.org, the site will see that someone from Germany is requesting a web page. But it’s also possible to route the same request through a proxy server in the US, or some other country.

Why does it matter where a request originates?

Most of the time, it makes zero difference. A request from Germany will be treated exactly the same as a request from the US. There are situations, however, where it does make a difference.

Geofencing

With the introduction of GDPR in Europe, some sites in the US catering to local communities have decided it’s not worth the hassle to comply with European privacy laws when most of their audience is outside of Europe. They set up geofencing on their sites to refuse access to visitors outside of the US. When you access a site like this from Europe, you’ll often see a message stating that they cannot serve European visitors.

But what happens when someone from the US tries to use our Feed Control, Feed Creator, or Full-Text RSS service with such a site? The request will go out from one of our servers in Germany and will be rejected when it reaches the geofenced site. Regardless of where you live, when you request content via our services, all requests currently look to the target site as if they originate from Germany, because that’s where our servers are based. So certain content accessible to our users in the US won’t be accessible when requested via our services.

Rate limiting

Additionally, there are also sites that will limit the number of requests a single visitor (determined by IP address) can make within a certain timeframe. Such rate limits are usually in place for good reasons. They can prevent malicious activity or excessive requests that can put too much strain on servers. But a sometimes unintended consequence of rate limiting is that requests that would normally be handled fine if made by users directly get rejected when they come from a limited pool of IP addresses belonging to a service acting on behalf of those same users. To the site receiving these requests, it can look like a handful of users making too many requests, rather than a 100 or so users making a reasonable number of requests. You might have experienced something similar if you’ve ever used a VPN service and found yourself unable to load certain sites because of “too many requests”.

How does a proxy server help?

To access sites that enforce geofencing (mostly in our experience US sites that refuse to serve European visitors), we can route requests through US proxy servers. Now the geofenced site sees a request from the US and no longer blocks it.

To handle the rate limiting issue above, a rotating proxy service can be used to distribute requests through a number of different servers, rather than one.

Proxy use in Feed Control

If you use our Feed Control application, we now let you enable proxy use for feeds you add to your account. When enabled, Feed Control will use a rotating proxy service to route requests through different US servers when fetching web content.

The feature is currently only available for two types of feeds in Feed Control:

  • Expanded feeds (when you enable ‘fetch full text’ to have article content retrieved from the source site)
  • Webpage to RSS feeds (feeds built with out Feed Creator application and then added to Feed Control)

In most cases, there will be no need to enable proxy use, so we suggest you try without it first and only enable if you have trouble. You can also contact us via the support link if you need assistance with a feed.

Enabling proxy use in Feed Control

It’s not yet possible to preview feed output with proxy use enabled without adding the feed to your account first (we’ll add support for that in a future update). So if you suspect the content you’re after is not being retrieved because of the issues listed above, you should add your feed in Feed Control’s management console and then enable proxy use.

To do that, follow the steps below:

  1. Log in to your Feed Control account
  2. From the left sidebar select Feeds
  3. Click Add Feed
  4. Paste the feed address into the URL field and click Add Feed
  5. In the Feed Details view that loads, click the Edit button
  6. In the Proxy field, select US Rotating
  7. Click Update Feed
  8. From the actions drop down, select Refresh feed
  9. Click the Feed items tab to see if new items appear (it might take a minute or so for the feed to refresh, so try refreshing the page if you don’t see anything immediately)

We currently limit the number of feeds on which you can enable proxy use based on your plan:

  • Standard – proxy use on up to 10 feeds
  • Plus – proxy use on up to 20 feeds
  • Business – proxy use on up to 50 feeds

If you need more than this, or if you have trouble with any feeds that you’d like us to take a look at, please contact us using the support link in Feed Control.

In the next post we’ll show you how to enable proxy use in our self-hosted software: Full-Text RSS and Feed Creator. We’ll show you how to configure our applications to use the Storm Proxies service, but any other proxy provider should work too.

The post Using proxy servers for content retrieval appeared first on FiveFilters.org.

]]>
How to turn a webpage into an RSS feed using Feed Creator – Part 2 https://www.fivefilters.org/2021/how-to-turn-a-webpage-into-an-rss-feed-pt2/ Thu, 01 Apr 2021 23:01:31 +0000 https://www.fivefilters.org/?p=3722 Feed Creator can turn a webpage into an RSS feed. Following on from part 1, in this post we're going to show you how to include additional information in the feed such as item dates, images and summaries.

The post How to turn a webpage into an RSS feed using Feed Creator – <nobr>Part 2</nobr> appeared first on FiveFilters.org.

]]>
In part 1 we showed you how to turn a webpage into an RSS feed using our Feed Creator application and its simple selector mode. In this post we’ll show you how to use advanced mode and CSS selectors to include additional item information such as the publication date, featured image, and summary text.

CSS selectors used to target elements on a web page

If you’re new to Feed Creator, we recommend you start by reading part 1 and then continue here.

What’s a CSS selector?

CSS is a standardised web technology primarily used for styling web page elements. As part of its specification, it includes selectors to target HTML elements to be styled. Feed Creator does not concern itself with the styling aspect of CSS, but does accept CSS selectors to help extract elements to be used in the feeds it produces.

Generating a feed from a webpage using CSS selectors

In this post we’re going to show you how to create a feed using CSS selectors, step by step. We’ll use Reuters Investigates as our source page, but the technique can be applied to any site.

Short on time?

If you’d rather have us create a feed for you, please submit a custom feed request.

What you’ll need

  1. Some basic knowledge of HTML and CSS selectors
  2. The webpage address (URL) of the source page you want to create a feed from
  3. Our Feed Creator application (we offer a free, hosted service to get started, no signup required – if you find it useful, it’s also available for self-hosting or as a premium, hosted service)
  4. Your browser’s developer tools to inspect the source page’s HTML (we’ll use Firefox’s Developer tools in this guide, but Chrome will be very similar)

Overview

These are the basic steps we’ll be following:

  1. Find appropriate selectors for the main item blocks
  2. Find appropriate selectors for individual item elements (e.g. title, date, image, summary)
  3. Enter the selectors in Feed Creator to generate the feed

Step 1: Make source page and Feed Creator easily accessible

We’ll be switching between the source page and Feed Creator in the steps below, so we recommend you open them in two tabs (or have the windows side-by-side).

Tab 1: Reuters Investigates – reuters.com/investigates/
Tab 2: Feed Creator – createfeed.fivefilters.org

Reuters Investigates and Feed Creator in two separate tabs.

Step 2: [Source page] Identify the items that should be used in the feed

In this example we’re using the Reuters Investigates page, and the areas we’ve marked in red rectangles contain the items of interest.

The items we want to turn into a feed from the Reuters Investigates site.

Step 3: [Feed Creator] Enter the source page URL and choose Advanced Selectors

Now switch to the Feed Creator tab and enter the Reuters Investigates URL in the field labeled ‘Enter web page URL’: https://www.reuters.com/investigates/

Below it, choose ‘Advanced Selectors’

Step 4: [Source page] Create selector for the desired items

To create a usable selector, we’ll want to inspect the desired items and identify the main elements in the underlying HTML. So let’s jump back to our source page.

Move your cursor over one of the items and right-click and choose ‘Inspect Element’ in Firefox (‘Inspect’ in Chrome).

Firefox context menu showing the 'Inspect Element (Q)' menu item.

You’ll now see the item’s underlying HTML markup. What we’re looking for is an HTML element for a single item. Later, we will use additional selectors to target title, summary, image and date elements within each selected item.

Firefox's inspector showing the underlying HTML

A common mistake is to identify an element that contains all the items and to create a selector for that. For example, the parent element of the highlighted <article> element in the image above is such an element, so targeting it with div.section-articles would be selecting a single element. That’s not what Feed Creator expects as the item selector (unless there’s only ever a single item on the page).

We have a number of options here for choosing a suitable CSS selector:

  • article to select all <article> elements on the page
  • article.section-article-container to select all <article> elements with a class attribute containing “section-article-container”
  • div.section-article to select all <div> elements with a class attribute containing “section-article”

Javascript-generated elements

At the moment Feed Creator only works with HTML elements that are returned by the server in its initial response. Some sites rely on Javascript to construct elements and sometimes pull in the desired items via additional requests after the page has loaded in your browser. When you inspect elements using your browser’s developer tools, as we’re doing here, you’re seeing the final result after Javascript execution. This might not be what Feed Creator sees when it processes the page.

The easiest way to make sure you’re not using attributes that Feed Creator cannot see is to disable Javascript in your browser temporarily. In Firefox’s developer tools, you can disable it temporarily in the settings panel (F1 to toggle) and ‘Disable Javascript’ in the ‘Advanced Settings’ section. There’s a similar setting in Chrome. You can then inspect elements using your browser’s developer tools.

Step 5: [Source page] Ensure selector targets all desired items

We want the selector we choose to match all the elements we want, and nothing more. An easy way to test this is to enter the selectors, one by one, into the Search HTML field in developer tools (CTRL+F in Chrome to bring up the search field).

Both Firefox and Chrome will show you how many elements are selected by the selector and will allow you to move through them by hitting Enter.

Firefox's developer tools showing HTML elements matching CSS selector 'article.section-article-container'
Using Firefox’s Search HTML field to find all HTML elements matching the CSS selector ‘article.section-article-container’

The HTML search field in developer tools is not exclusively for CSS selectors, so when entering ‘article’, Firefox will also find instances of the text ‘article’ wherever it appears in the HTML. To avoid this, change the input to something that more resembles a CSS selector, such as by adding ‘html’ before the selector: ‘html article’. This will find all <article> elements within the root <html> element, essentially the same CSS selector as just ‘article’.

Another option is to open the console in developer tools with CTRL+Shift+K (CTRL+Shift+J in Chrome) and enter your CSS selector in a call to $$(), for example: $$('article'). You will then see a list of selected elements which you can hover over to highlight on the page or click into to view in the element inspector panel.

All three selectors listed in the previous step match the content we want on the page, so we could go with any one of them. When deciding which selector to use, we like to consider the likelihood of a selector matching more than we want in the future, or a completely different set of items in the case of a site redesign. That’s more likely to happen with article (for example, an element <article class="related"> could get added at some point in the future) than with the more targetted article.section-article-container or div.section-article. In situations like this, we’d pick one of the latter two.

We’d also caution against going too far in the other direction and choosing a very specific selector, such as section.main div.section-articles article.section-article-container, this will also match the items we want, but now we’ve made our selector quite brittle by being overly reliant on the HTML structure of the page as it is now.

So far we’ve found selectors which match the content we want, but before we move on, let’s make sure they don’t match items we don’t want. If you scroll down on the page, you will see one of the elements selected, rather than containing an investigative piece from Reuters, contains an image with the text “Do you have a news tip? How to contact Reuters securely”.

"Do you have a news tip? How to contact Reuters securely."

That’s not something we want selected, so let’s consider the ways we can remove it in Feed Creator.

Step 6: [Source page] Removing unwanted elements

In part 1, we saw that Feed Creator allows us to use CSS selectors to remove HTML elements. In addition to that, now that we’re in advanced mode, we can modify our CSS selector itself to be more specific about what what we want. We can also use Feed Creator’s URL filtering to remove items if they have a particular URL segment. We’re going to look at all these approaches now.

The first thing we want to do is to inspect the unwanted element, as we’ve done before, to see what we can work with:

Inspecting the unwanted item in Firefox.

There’s a lot of similarity between the element names and attributes inside this unwanted item and the other items that we do want to keep.

Here are some difference:

  • The unwanted item contains an id attribute with value “article-3XSVNV3WN1”
  • The unwanted item URL contains the segment “/tips/” at the end: https://www.reuters.com/investigates/special-report/tips/

And here are three ways of using these differences to remove this item from the results:

  1. Using ‘Enable cleanup’ in Feed Creator and adding the CSS selector: #article-3XSVNV3WN1
  2. Changing our item selector to exclude the unwanted item, e.g. article.section-article-container:not(#article-3XSVNV3WN1)
  3. Using ‘Enable remove filters’ in Feed Creator and adding the URL segment: /tips/

Our recommendation is to be cautious when using attribute values that contain a sequence of letters and numbers, as they’re often a sign that the value is randomly generated and could change in subsequent versions of the page. Sure enough, if we check the Internet Archive for previous versions of this page, we’ll see the code in this id attribute (3XSVNV3WN1) does indeed change, and therefore isn’t suitable as part of a selector.

Step 7: [Feed Creator] Add item selector and enable remove filters

Let’s use what we have so far and enter it into Feed Creator. Find the field labeled ‘Item selector (CSS)’ and enter: article.section-article-container.

Main item selector in Feed Creator

Next, scroll down until you see the ‘Enable remove filters’ toggle and switch it on. In the field labeled “Remove item if item URL contains any of these segments:”, enter /tips/ and hit Enter.

URL segment filter in Feed Creator

Hit preview now to make sure Feed Creator returns results.

Feed Creator preview with only the item selector used.
Feed Creator with item selector and remove filter (click to load)

The free version of Feed Creator only returns the latest 5 items, so you won’t see any difference in results if you enable/disable the remove filter becuase the item we want removed is not one of the first 5 items on the page.

Step 8: [Source page] Create selectors for additional elements

Now that we’ve got a feed with the items we want included, let’s expand it to include each item’s publication date, image and summary text. We’ll also be explicit about targetting the title element. Feed Creator provides fields for you to enter additional selectors for these.

Feed Creator CSS selector fields to target additional information for the feed.
Additional selector fields in Feed Creator’s advanced mode

Before we get started, you should be aware of some differences between the main item selector (the one we used in step 7) and the ones we’re going to use now:

  • The main item selector is applied within the context of the entire page
  • The selectors here are applied within the context of the items selected by the main item selector
  • The main item selector is intended to select multiple items
  • The selectors here will only select the first matching item

How do we find these additional selectors? The same way we did before: by inspecting an item in the browser to find suitable selectors to target the information we want.

Item title selector (CSS)

Item titles are in <h2> elements on this page:

<h2 class="subtitle" itemprop="headline">The Fatal Shore</h2>

Feed Creator selector: h2

Alternatives:

  • .subtitle
  • h2.subtitle
  • h2[itemprop="headline"]

Item description selector (CSS)

Item summaries are in <p> elements:

<p itemprop="description">Genomic scientists raced against time to find out what was causing the deadly surge in cases despite a national lockdown. <span class="tail">Full Story</span></p>

Feed Creator selector: p

Alernatives:

  • p[itemprop="description"]
  • *[itemprop="description"]

You can remove the “Full Story” text in the HTML above by using the HTML cleanup feature of Feed Creator and adding .tail or span.tail.

If there’s no description available, and you’d like to have one, you can ignore it for now and later pass the generated feed to our Full-Text RSS application via the Service Shortcuts button in Feed Creator. Full-Text RSS can recreate the feed by pulling in additional data for each item.

Item date selector (CSS)

Item dates are in <time> elements:

<time itemprop="datePublished" datetime="2021-03-26T11:00:00+00:00">March 26, 2021</time>

Feed Creator selector: time

Alernatives:

  • time[itemprop="datePublished"]
  • time[datetime]
  • time @datetime (Not a pure CSS selector, the @ part is Feed Creator-specific, see below)

Feed Creator lets you select attribute values using @attribute-name at the end of the selector. To select the more computer-readable datetime attribute, we could enter: time @datetime

If a site doesn’t display the date and you’d like it included in the generated feed, you can ignore the date for now and pass the generated feed without a date to our Feed Control service. In there you can tell Feed Control to generate a new feed and use the date it detects each new item as the item’s publication date.

Item image selector (CSS)

Item images are in <img> elements:

<img itemprop="contentURL" class="img-fluid" src="https://www.reuters.com/investigates/special-report/assets/section-leads/homepage/health-coronavirus-uk-variant/home_HEALTH-CORONAVIRUS-UK-VARIANT.jpg?v=010214260321">

Feed Creator selector: img

Alternatives:

  • img.img-fluid
  • img[itemprop="contentURL"]

Feed Creator will show an [image] link in the preview if it successfuly finds a URL using the selector. The selected image appears inside the feed output in three places: as a <media:content> element, an <atom:link rel=”enclosure”> element, and embedded inside the <content:encoded> element as an HTML <img> element.

Step 9: [Feed Creator] Enter additional selectors and preview the feed

Now we just need to enter the selectors identified in the previous step into Feed Creator. Once you do, hit Preview to see the results. You should see something like this:

The final feed preview in Feed Creator
Feed Creator preview using the additional selectors to include extra information (click to load)

Feed Creator’s preview links to the images, without displaying them, but they are included in the feed. For example, here’s how Feedly shows the feed we just generated:

Feedly view of the Feed Creator feed
Feedly’s view of the feed we just created

Done!

You can now use the buttons Feed Creator provides in the Result column to use your generated RSS feed in other applications. If you subscribe to the feed in a news reading application, you’ll be notified when new items are published.

The RSS feed button will load the feed in your browser or prompt you to open it in a supporting application (if you have one installed). You can copy the generated feed URL by right-clicking this button and choosing ‘Copy link location’.

The Subscribe button will open a panel with a list of feed readers. If you see one you use, click its name and we’ll pass the generated feed into the feed reader so you can subscribe to it and be notified of new items.

The Service shortcuts button opens a panel with shortcuts to some of our other applications that can take a feed as input. You can choose ‘RSS with full text’, for example, to have the generated feed passed to our Full-Text RSS application which will expand the feed by pulling in the article content for each item.

That’s it. To recap, we used Feed Creator to turn a webpage into an RSS feed by extracting elements from the source page (Reuters Investigates in this example) using CSS selectors. You should now be able to apply the same technique to almost any page you like.

What about future changes to the source site?

What we’ve done by using selectors is to ensure that new items published on the site will automatically be included in our feed. But what if the structure of the site changes in such a way that our selectors no longer match the items? If that happens you’ll find your feeds will stop picking up new items and you will have to update your CSS selectors to match the new structure of the page.

Feed Creator generates feeds by embedding the entered CSS selectors and filters in its feed URLs, e.g.:

https://createfeed.fivefilters.org/extract.php?url=https%3A%2F%2Fwww.reuters.com%2Finvestigates%2F&item=article.section-article-container&strip_if_url[]=%2Ftips%2F

To edit an existing feed, copy its URL into your browser and change ‘extract.php’ to ‘index.php’:

https://createfeed.fivefilters.org/index.php?url=https%3A%2F%2Fwww.reuters.com%2Finvestigates%2F&item=article.section-article-container&strip_if_url[]=%2Ftips%2F

Feed Creator will now load all your selectors from your feed into its interface and allow you to make changes.

After making changes, you’ll have a new feed URL with new selectors, so you’ll also have to update the previous feed URL wherever it was used before.

If you use the feed in multiple places, or don’t have easy access to update its URL after it’s changed, you can add your Feed Creator feeds to Feed Control first and use its generated feeds instead. You can then make changes to the source feed URL in Feed Control without needing to update the feed URL that Feed Control produces.

Browser extensions to help you find and test CSS selectors

This guide has shown you how to use your browser’s built-in page inspector tools to find and test suitable CSS selectors for Feed Creator. There are also browser extensions available to make the task a little easier. If you’re curious, you can have a look at the following:

  • Try XPath for Firefox lets you enter a CSS selector (choose querySelectorAll in the “Way” dropdown) and see all matching elements highlighted with red dashed borders.
  • Easy Select for Firefox and Chrome extends the browser’s developer tools to make the task of creating a suitable CSS selector easier. You can add class attribute values to your selector easily and instantly get an updated count of selected elements, as well as the option of having them highlighted.
  • SelectorGadget for Chrome lets you find a CSS selector by pointing and clicking on elements on the page.

Discuss

Please share any feedback or questions on our forum.

The post How to turn a webpage into an RSS feed using Feed Creator – <nobr>Part 2</nobr> appeared first on FiveFilters.org.

]]>
How to turn a webpage into an RSS feed using Feed Creator – Part 1 https://www.fivefilters.org/2021/how-to-turn-a-webpage-into-an-rss-feed/ Sat, 20 Mar 2021 16:46:31 +0000 https://www.fivefilters.org/?p=3651 Feed Creator can turn a webpage into an RSS feed. In this post we're going to show you how to create such a feed, step by step.

The post How to turn a webpage into an RSS feed using Feed Creator – <nobr>Part 1</nobr> appeared first on FiveFilters.org.

]]>
April 2021 Update: Reuters have redesigned their homepage so if you’d like to follow the steps in this guide where we examine the HTML in the browser, you should load an older version via the Internet Archive. Older versions will have the same HTML structure shown in this guide.

Our Feed Creator application can turn a webpage into an RSS feed. It’s useful for sites that don’t offer their own feeds. In the last post we noted that Reuters killed off their official RSS feeds last year, and we provided alternative feeds made with our Feed Creator application.

What’s an RSS feed?

RSS feeds typically contain the most recent items associated with a resource. An RSS feed provided by a news site will contain its most recent news items. RSS feeds conform to a standardised, machine-readable XML-based format, allowing them to be read by and integrated into many different systems.

Feed Creator works by converting a set of items on a webpage into a standard RSS feed. The items don’t have to be news stories: they can be search results, job listings, blog posts, podcast episodes, anything really. At a minimum, a feed item should contain either a title or a description.

Webpage into RSS feed with Feed Creator

Use the generated RSS feed to monitor the page for new items, and integrate it with other applications and services that read feeds.

RSS feeds can be expanded with Full-Text RSS from FiveFilters.org, subscribed to with Feedly, connected to other services via Zapier, IFTTT and Integromat.
Many services support RSS feeds, offering useful integration options

Some examples of what you can do using the services above once you’ve generated a feed with Feed Creator:

  • Expand the feed using our Full-Text RSS service to include the full article content
  • Subscribe to it in Feedly to stay up to date with new entries
  • Post new items automatically to Facebook, Twitter or LinkedIn via Zapier, IFTTT or Integromat
  • Share new items automatically with teammates by email or Slack
  • Add new items automatically to a spreadsheet in Excel, Google Sheets, or Airtable
  • Receive webhooks when new items are detected with our Feed Control application

Generating a feed from a webpage

In this post we’re going to show you how to create a feed, step by step. We’ll use Reuters as our source page, but the technique can be applied to any site.

Short on time?

If you’d rather have us create a feed for you, please submit a custom feed request.

What you’ll need

  1. Some basic knowledge of HTML
  2. The webpage address (URL) of the source page you want to create a feed from
  3. Our Feed Creator application (we offer a free, hosted service to get started, no signup required)
  4. Your browser’s developer tools to inspect the source page’s HTML (we’ll use Firefox’s Developer tools in this guide, but Chrome will be very similar)

Step 1: Load the source page and Feed Creator in two separate tabs

We’ll be switching between the source page and Feed Creator in the steps below, so we recommend you open them in two tabs (or have the windows side-by-side).

Tab 1: Reuters home page – reuters.com
Tab 2: Feed Creator – createfeed.fivefilters.org

Step 2: [Source page] Identify the items that should be used in the feed

In this example we’re using the Reuters front page, and the areas we’ve marked in red rectangles contain the items of interest.

Reuters front page

Step 3: [Feed Creator] Enter the source page URL

Now switch to the Feed Creator tab and enter the Reuters URL in the field labeled ‘Enter web page URL’.

Feed Creator with URL input

At this point if you click ‘Preview’, Feed Creator will fetch the page and extract the first set of links it finds. We don’t want these, so let’s instruct Feed Creator to use the links we’re interested in.

Feed Creator lets you use simple selectors or more flexible CSS selectors. We cover the simple mode in this post and in the next post we’ll cover advanced selectors.

Step 4: [Source page] Inspect desired item elements to identify attributes

Now let’s jump back to our source page and examine the HTML markup of our desired elements.

Move your mouse over one of the elements and right-click and choose ‘Inspect Element’ in Firefox (‘Inspect’ in Chrome). You’ll now see the item’s underlying HTML markup.

Firefox developer tools showing inspect view.
“story-content” class attribute shown in Firefox’s Developer Tools

Feed Creator in simple selector mode uses link elements to construct feed items. The link URL becomes the feed item URL, and the link title becomes the item title. In HTML, these are marked up as follows:

<a href="[Link URL]">[Link title]</a>

But web pages typically contain many such links, for example as part of navigation menus, sidebars, footers. We don’t want all these links to end up in the feed, so we want to examine the HTML of our desired items to find an attribute value shared only by those items. Feed Creator can use this attribute value to extract links only from the desired items.

HTML documents often use class attributes to tag elements of the same type. In the screenshot above the class attribute value “story-content” serves that purpose.

Step 5: [Source page] Ensure chosen attribute value is common to all desired elements

We want to make sure that the other news items we want to include in the feed also have this attribute value.

Repeat step 4 (right-click and then ‘Inspect element’) on the other news items, and you’ll see that they have the same “story-content” class attribute value. Great so far.

Does “story-content” also appear on the top story item, which is presented differently to the other items?

Firefox develop tools showing inspect view
“story-content” appears here too

It does. So we’ve found a class attribute value that’s common to the links we’re interested in, now let’s give it to Feed Creator.

Javascript-generated elements

At the moment Feed Creator only works with HTML elements that are returned by server in its initial response. Some sites rely on Javascript to construct elements and sometimes pull in the desired items via additional requests after the page has loaded in your browser. When you inspect elements using your browser’s developer tools, you’re seeing the final result after Javascript execution. This might not be what Feed Creator sees when it processes the page.

The easiest way to make sure you’re not using attributes that Feed Creator cannot see is to disable Javascript in your browser temporarily, reload the source page, and then inspect elements using your browser’s developer tools.

Step 6: [Feed Creator] Extract links from elements with a particular class attribute value

In Feed Creator, find the field labeled “Get links inside HTML elements with this id or class value” and enter “story-content”. Click the Preview button to see the results.

Feed Creator with simple selector to turn webpage into an RSS feed

You can see that with only two pieces of input (source page URL and ‘story-content’), Feed Creator is able to produce a usable feed for the Reuters site.

Step 7: Removing elements

Notice, however, that in the image above, two items related to the top story also appear:

  • U.S. says had serious talks despite ‘theatrics’
  • U.S., China spar over racism at U.N. meeting

We hadn’t marked these as items of interest in Step 2, so let’s tell Feed Creator to exclude them from the feed output.

Using the method described in step 4, inspect the HTML elements for these two items.

You’ll notice that they appear inside the element marked “story-content”, presented as a bulleted list (<ul> element in HTML).

Using Feed Creator’s cleanup feature, we can tell it to remove all <ul> elements from the page, to ensure the links inside these elements aren’t extracted.

To do this, toggle the ‘Enable cleanup’ switch and enter ‘ul’ in the field labeled ‘Source HTML: Remove elements (CSS)’.

Feed Creator with HTML cleanup enabled
Tell Feed Creator to remove all <ul> elements from the source page

That’s it. Click Preview again and you should see those elements now no longer appear in the results list.

Done!

You can now use the buttons Feed Creator provides in the Result column to use your generated RSS feed in other applications.

The RSS feed button will load the feed in your browser or prompt you to open it in a supporting application (if you have one installed). You can copy the generated feed URL by right-clicking this button and choosing ‘Copy link location’.

The Subscribe button will open a panel with a list of feed readers. If you see one you use, click its name and we’ll pass the generated feed into the feed reader so you can subscribe to it and be notified of new items.

The Service shortcuts button opens a panel with shortcuts to some of our other applications that can take a feed as input. You can choose ‘RSS with full text’, for example, to have the generated feed passed to our Full-Text RSS application which will expand the feed by pulling in the article content for each item.

That’s it for now. To recap, we used Feed Creator to turn a webpage into an RSS feed by extracting elements from the source page (Reuters in this example).

In Part 2 we cover advanced selectors, where you’ll see how we can be much more specific in selecting items from the source page. We’ll also cover how to include item dates, summaries and images in the feed output.

Discuss

Please share any feedback on our forum.

The post How to turn a webpage into an RSS feed using Feed Creator – <nobr>Part 1</nobr> appeared first on FiveFilters.org.

]]>
Reuters RSS feeds dead? https://www.fivefilters.org/2021/reuters-rss-feeds/ Tue, 09 Mar 2021 12:50:55 +0000 https://www.fivefilters.org/?p=3616 Reuters officially stopped producing RSS feeds in June 2020, but using a workaround you can still subscribe to their news items.

The post Reuters RSS feeds dead? appeared first on FiveFilters.org.

]]>
Reuters officially stopped producing RSS feeds in June 2020, but there are workarounds available.

Our Feed Creator application can generate RSS feeds from almost any website, including Reuters. We often get requests from people who want to use Feed Creator to generate RSS feeds from the Reuters site, and as long as Reuters continues publishing news stories on its site, you can use it to do just that.

If you’re here for the RSS feeds, we’ve prepared a few for you:

These feeds are hosted by our Feed Control application and we’ll do our best to keep them working, even if there are changes on the Reuters side.

Update April 2021: Due to site changes at Reuters, some of the feeds above stopped working for about a week. We’ve updated them now so they should continue working. The Wire web page is no longer accessible, so that feed will receive no updates unless the situation changes.

Webpage to RSS feed with Feed Creator

If you’re wondering how we produced the Reuters feeds above, we’ve written how-to posts with step-by-step instructions. You can apply the same technique to generate RSS feeds from many other sites.

The post Reuters RSS feeds dead? appeared first on FiveFilters.org.

]]>
PHP 8 fixes for Feed Creator and Full-Text RSS https://www.fivefilters.org/2021/php-8-fixes-for-feed-creator-and-full-text-rss/ Sun, 07 Mar 2021 01:22:26 +0000 https://www.fivefilters.org/?p=3565 New versions of Feed Creator and Full-Text RSS are now available for self hosting. They fix problems users reported when running the software with PHP 8.

The post PHP 8 fixes for Feed Creator and Full-Text RSS appeared first on FiveFilters.org.

]]>
New versions of Feed Creator and Full-Text RSS are now available for self hosting. They fix problems users reported when running the software with PHP 8.

Feed Creator 2.2.1 and Full-Text RSS 3.9.11 can be downloaded from the customer portal.

Please let us know if you experience any issues.

Changelogs

The post PHP 8 fixes for Feed Creator and Full-Text RSS appeared first on FiveFilters.org.

]]>
Feed Creator 2.2 https://www.fivefilters.org/2021/feed-creator-2-2/ Tue, 23 Feb 2021 19:59:58 +0000 https://www.fivefilters.org/?p=3127 Feed Creator 2.2 is now out! Feed Creator converts web pages into RSS feeds, and can merge and filter existing feeds. Along with a number of bug fixes (see changelog at the end), there are some new features in this version.

The post Feed Creator 2.2 appeared first on FiveFilters.org.

]]>
Feed Creator 2.2 is now available.

Feed Creator converts web pages into RSS feeds, and can merge and filter existing feeds.

What’s new?

Along with a number of bug fixes (see changelog at the end), there are some new features in this version.

We’ve also updated the documentation.

Image selector

It’s now possible to extract an image URL for each feed item.

The image URL will be used in three places in the feed output: inserted into a <media:content> element, an <atom:link rel=”enclosure”> element, and a regular <img> element inside a <content:encoded> element.

In the JSON output, the image URL will be included in a property with the name “image”.

Query string cleaner

You can now keep or remove query string parameters from item URLs.

The query string in a URL appears after the question mark symbol, e.g.

http://example.org/article?id=879&session=19382

The URL above has two query string parameters, named ‘id’ and ‘session’. On some sites, query string parameters identify content, and should be preserved. On others, they are used for tracking and can be stripped.

We recommend stripping non-essential query string parameters because they can affect whether feed items are treated as new or not by your feed reader.

The possible values you can use in this new field are:

  • 1 = preserve all (default)
  • 0 = remove all
  • param1,param2 = remove all except param1 and param2

In the example URL above, the ‘id’ parameter identifies the article and should be preserved, but the session parameter is nonessential.

The site might generated a new session ID for its links next time Feed Creator fetches the page, which might result in the same feed items now being treated as new by a feed reader because the URLs now look different from before.

To prevent that happening, we can tell Feed Creator to only preserve the ‘id’ parameter by entering ‘id’ in this field.

Item guid

A guid is an identifier that’s usually used by feed readers to determine if a feed item is new or not. It’s not required by the RSS spec, but some feed readers might want it included.

By default, the guid is not included when you generated a feed with Feed Creator.

If you’d like it included, you can now tell Feed Creator to generate an ID based on each item’s url, title or both.

If the guid is omitted, most feed readers will use the item URL to determine if a feed item is new or not.

Improved CSS support

We now use Symfony’s css-selector to convert CSS into XPath. This allows you to use more CSS selectors than before.

In addition to accepting multiple selectors (comma-separated) in the main item selector field, in version 2.2 Feed Creator accepts multiple selectors in all item fields (e.g. item title, item URL). This comes in useful when the source page uses different HTML structures to hold the items you need. For example, a news site might have the top news item appear larger than other the other news items on the page, marked up with different HTML.

Let’s take the following HTML as an example. It contains 4 news items:

<div id="top-story">
  <h1>Big jump in vaccine supply is coming soon</h1>
  <img src="https://cdn.example.org/vaccine.jpg">
  <a href="/big-jump-in-vaccine.html">Read more</a>
</div>

<div class="more-stories">
  <h2><a href="/first-hearing.html">
    First hearing on Capitol riot
  </a></h2>
  <h2><a href="/white-house-promises.html">
    White House promises vaccine help
  </a></h2>
  <h2><a href="/leaders-of-texas-grid.html">
    Leaders of Texas’ grid operator resign
  </a></h2>
</div>

To select all 4 items in Feed Creator, you can enter the following into the Item selector (CSS) field:

#top-story, .more-stories h2

And in the Item title selector (CSS) field:

h1, a

Note that the main item selector can return multiple elements (4 in our example). The selectors you enter in the other item fields (item title, description, date, image, or URL) might match more than one element, but only the first matching element (based on the position in the source HTML) is used.

Feedback

Please try out this new version and let us know what you think.

Existing customers can download or upgrade from the customer page.

Changelog

Feed Creator 2.2.1 (2021-03-06)

  • Bug fix: Warning when running on PHP 8 and using cURL for HTTP handling (fixed in RollingCurl.php)
  • HTML5-PHP library updated to version 2.7.4
  • SimplePie library updated to version 1.5.6

Feed Creator 2.2 (2021-02-26)

  • Allow multiple comma-separated selectors in item and item_* parameters (useful when items are spread across different HTML structures)
  • Use Symfony CSS selector library – allows for more specific element targeting, e.g. a[title], p:nth-child(2), img[src*=”large”], see https://developer.mozilla.org/docs/Web/CSS/Attribute_selectors
  • New item_image parameter to select an image for an item (if available)
  • Added support for attribute selector (@attr) to the item_url parameter
  • New guid parameter to specify if and how guid element should be generated: ‘0’=ignore (default), ‘url_title’=URL+title ‘url’=URL, ‘title’=title
  • New proxy parameter to name the proxy server to be used for the request (set up in config file)
  • New keep_qs_params parameter for cleaning the query string in item URLs: ‘1’=keep all (default), ‘0’=strip all, or a comma-separated list of field names to preserve (e.g., ‘id’ or ‘id,cat’)
  • Added Feed Control – https://feedcontrol.fivefilters.org – as a subscribe option
  • Added BazQux Reader – https://bazqux.com – as a subscribe option
  • Bug fix in mergefeeds.php: Item titles and URLs (and feed title and description) no longer double-encoded when they contain characters that need encoding
  • PHP 8 compatible
  • Other minor improvements

The post Feed Creator 2.2 appeared first on FiveFilters.org.

]]>
Feed Control, Full-Text RSS, Feed Creator: Which to choose? https://www.fivefilters.org/2020/feed-control-comparison/ Mon, 28 Dec 2020 09:31:00 +0000 https://www.fivefilters.org/2020/feed-control-comparison/ We now offer a few products for working with feeds. This post looks at the the differences between them to help you choose the right one.

The post Feed Control, Full-Text RSS, Feed Creator: Which to choose? appeared first on FiveFilters.org.

]]>
Since launching Feed Control, some of you have asked how it compares to Full-Text RSS and Feed Creator. This post will try to answer that.

TLDR: If you’re not a developer, and have no need to run our tools on your own server, choose Feed Control. If you’re a developer, read on.

Our feed products are used in combination with feed reading applications such as Feedly, Newsblur, Fraidycat, and many others, but also by developers who need custom integrations with their own applications and workflows, usually in relation to monitoring and extracting information from blogs and news publications.

We’ll be taking both types of use into account when comparing the products, but because the solutions differ more when evaluated from a developer perspective, we’ll try to focus more on that angle when comparing. If you’re not a developer, you can ignore the parts aimed at developers.

Full-Text RSS

Full-Text RSS is our feed expansion application. It takes a partial feed (e.g. a feed which only contains a short summary of each article) and converts it to a full-text feed by pulling in the full article content for each item.

If you enjoy reading full articles within your news reading application and not having to click into the site itself, Full-Text RSS can help.

If you’re a developer, Full-Text RSS can also be used to extract article content from individual articles. Instead of giving it a feed URL, give it an article URL and it will try to extract the article content and return it along with additional information that might be useful (e.g. language, author).

Full-Text RSS can be used as a hosted service run by us, or bought to run yourself on your own server. We also offer the service via RapidAPI for developers, which is a great way to get up and running integrating it into your own application.

Feed Creator

Feed Creator is our feed creation application. It has two main uses:

  1. Creating feeds from web pages which don’t offer their own
  2. Filtering or merging existing feeds

To create a feed from a web page, you give it the web page URL and some selectors for the content you want extracted. When filtering or merging existing feeds, you give it feed URLs and keywords or URL segments to use as filters.

Feed Creator can be used as a hosted service run by us, or bought to run yourself on your own server.

Feed Control

Feed Control builds on both of the above. You can use it to create full-text feeds and also generate feeds from a web page.

Full-Text RSS and Feed Creator are more bare-bones compared to Feed Control. They have been developed to be fast, efficient, and stateless, with a small server footprint. For developers, they are ideally used as microservices, and can easily be set up on a server and scaled.

While that approach has benefits, keeping those applications small and lean – which we’re committed to doing – does also limit us in what we can implement as part of each service. Feed Control is our effort in making both those tools a little more accessible and adding features many of our users have requested over the years.

Below, we’re going to look at some of the features of Feed Control that are currently not available in Full-Text RSS and Feed Creator. (We have plans for more, which we’ll be covering here in due time.)

Note: Feed Control is currently only available as a hosted service. We have no plans yet to offer a self-hosted version.

User interface for managing feeds

Feed Control

Create an account to store your feeds. View and manage feeds from the admin interface.

Full-Text RSS / Feed Creator

There is no record of the feeds you’ve created.

Faster feed access via a CDN

Feed Control

When you enable RSS or JSON generation for a feed, the resulting file is stored on a content delivery network (CDN) for fast access. This is automatically updated as new items for the feed are pulled in.

Full-Text RSS / Feed Creator

Full-Text RSS and Feed Creator process and generate feeds on an ad-hoc basis, as requests come in (with some caching to increase performance). This approach can result in delays when returning content, especially if the source feed is on a server that’s slow to respond.

Twitter feeds

Feed Control

Monitor and generate feeds from a user’s Twitter timeline. With or without retweets.

Full-Text RSS / Feed Creator

Not available.

Email alerts

Feed Control

Enable email alerts for a feed to receive a notification for each new item detected, or a daily, weekly, or monthly summary.

Full-Text RSS / Feed Creator

Not available.

Webhooks

Webhooks are intended for developers.

Feed Control

Enable webhooks for a feed, and Feed Control will send your application data for each new item. It will send the original HTML; a stripped-down, sanitized version; and plain text.

It’s a great alternative to polling feeds for updates in your application, and also a nice solution for serverless setups. Read more about webhooks in our documentation, including basic code examples.

Full-Text RSS / Feed Creator

Not available.

Generate stripped-down HTML

Feed Control

Feed items that contain a lot of HTML with non-essential elements or styling can be stripped-down further to improve display. When enabling feed generation in Feed Control, you have the option of enabling this.

Full-Text RSS / Feed Creator

Not available.

Filtering

Feed Control

Set up filters to only include items of interest, e.g. ignore a tweet unless it mentions ‘covid’, or alternatively, ignore tweets that do mention ‘covid’. You can filter on the item URL too, e.g. ignoring items unless they contain the segment ‘/news/’ in the URL.

Full-Text RSS / Feed Creator

Filtering is not available in Full-Text RSS, but is available in Feed Creator.

Try for free

We think Feed Control is the best place to start for most users, but you can try all three for free:

Feedback or questions?

If you have any questions or feedback about our tools, or are unsure which one’s right for you, feel free to get in touch on our forum, by email, or Twitter.

The post Feed Control, Full-Text RSS, Feed Creator: Which to choose? appeared first on FiveFilters.org.

]]>
Feed Creator 2.1 https://www.fivefilters.org/2020/feed-creator-2-1/ Tue, 30 Jun 2020 07:30:00 +0000 https://www.fivefilters.org/2020/feed-creator-2-1/ Feed Creator 2.1 is now available. Ability to select attributes, submit HTTP headers, and a few bug fixes are the main changes.

The post Feed Creator 2.1 appeared first on FiveFilters.org.

]]>
Feed Creator 2.1 is now available. With Feed Creator you can generate RSS feeds for web sites which don’t offer their own. You can also filter and merge existing feeds to remove items that you don’t need. If you’re new to Feed Creator, read the guide.

Existing customers can download the latest version through our customer login. If you’d like to use the hosted premium service, you’ll find the button to purchase below the main form.

Select attributes

It’s now possible to select attribute values when constructing a CSS selector for item title, item description, and item date. To do so you add @attribute-name to the end of the selector.

For example, the date selector: time @datetime will select the datetime attribute in <time datetime="2020-06-30 10:30:00 GMT">1 hour ago</time>

Another example: the item title selector: img.main @alt will select the alt attribute text in an image element as the item title.

Send HTTP headers

It’s now possible to submit cookies, referer URLs and user agent strings as HTTP headers with the three new fields in the custom HTTP header section.

Fraidycat as a subscribe option

We’ve now added Fraidycat as a subscribe option. Do check out the site if you haven’t come across it before.

Expanded inline help

We’ve added more information and examples in help boxes which open when you click the question mark icons next to some of the fields on the form. This replaces the small tooltips we used before.

We’ve also made these available on our help site.

Full changelog

  • Bug fix: URLs with non-ASCII characters now handled better
  • Bug fix: Case sensitive matching was used for text_contains and strip_if_text values when using in_id_or_class selector (now case insensitive, matching behaviour in advanced-selector mode)
  • Specify element attribute with @attr to get text content from an attribute value. Use with item_title, item_date, item_desc: e.g. item title selector can be ‘img @alt’ or ‘a.story @title’
  • When using advanced selectors, item_* parameters can now select context element itself (specified in item parameter) with ‘:scope’ as selector
  • Max items value in $options->max_items now used in form
  • When handling dates, relative dates (e.g. ‘10 hours ago’ or ‘4:03am’) will be ignored (otherwise item date could change subsequent requests)
  • Subscribe options can be modified in config ($options->subscribe_shortcuts)
  • Added Fraidycat – https://fraidyc.at – as a subscribe option
  • Added field for entering User-Agent HTTP header string (&ua=… parameter in extract.php)
  • Added field for entering Cookie HTTP header string (&cookie=… parameter in extract.php)
  • Added field for entering Referer HTTP header string (&referer=… parameter in extract.php)
  • Change default User-Agent HTTP header string in config ($options->user_agent_default)
  • Change default Referer HTTP header string in config ($options->referer_default)
  • Set cache time in config with $options->cache_time (60 minutes by default)
  • Expanded inline help with popup boxes which appear after clicking the question mark icons next to field labels

The post Feed Creator 2.1 appeared first on FiveFilters.org.

]]>