Full-Text RSS – FiveFilters.org https://www.fivefilters.org Web articles made accessible Mon, 10 May 2021 10:35:25 +0000 en-US hourly 1 https://wordpress.org/?v=5.7.1 https://www.fivefilters.org/wp-content/uploads/2020/04/cropped-site-logo-round-32x32.png Full-Text RSS – FiveFilters.org https://www.fivefilters.org 32 32 Google News RSS feeds https://www.fivefilters.org/2021/google-news-rss-feeds/ Thu, 06 May 2021 22:21:19 +0000 https://www.fivefilters.org/?p=3885 It's possible to get a variety of RSS feeds from Google News, and these all come from Google itself, so they don't have to be generated with third-party tools like our Feed Creator.

The post Google News RSS feeds appeared first on FiveFilters.org.

]]>
It’s possible to get a variety of RSS feeds from Google News, and they all come from Google itself, so they don’t have to be generated with third-party tools like our Feed Creator.

In this post we’ll show you how to get RSS feeds for top stories, topics, search results, and site-specific feeds. Each section will show you how to use Google News to get the news items you want, and then how to get those same results as an RSS feed.

Top stories

Google’s top stories are at: https://news.google.com/topstories

When you load that page, Google will set a country for you (e.g. US or UK) and show you the top stories for the audience of that country. The URL will change to reflect the language and country.

News for a US audience: https://news.google.com/topstories?hl=en-US&gl=US&ceid=US:en

If you’d like the top stories for a different region or in a different language, find ‘Language & region’ in the left sidebar (towards the bottom) and click it. You’ll then be able to select a different language and region.

Google News language and region
Google News language & region selection

After making your selection, the news items will update and if you look at your browser’s address bar, you’ll notice the URL has changed to refelect your selection. Google uses language codes (lowercase letters) and country codes (uppercase letters) in its URLs:

RSS feeds for top stories

To get RSS feeds for the top stories you want, simply replace ‘topstories‘ in the URL with ‘rss‘:

You can make this URL replacement in your browser’s address bar to view the RSS source.

Google News website and RSS feed side by side.

Copy and paste the RSS feed URL into your favourite news reader to subscribe to it and receive updates.

Topics

Google also provides news for different topics:

You can select from a set of main topics using the links in the left sidebar, or use the topic search to find trending topics or search for topics.

Google News topics
Google News topic selection

RSS feeds for topics

To get RSS feeds for topics you want, replace ‘/topics‘ in the URL with ‘/rss/topics‘:

  • [RSS] Technology
    https://news.google.com/rss/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNRGRqTVhZU0FtVnVHZ0pIUWlnQVAB?hl=en-US&gl=US&ceid=US%3Aen
  • [RSS] Health
    https://news.google.com/rss/topics/CAAqIQgKIhtDQkFTRGdvSUwyMHZNR3QwTlRFU0FtVnVLQUFQAQ?hl=en-US&gl=US&ceid=US%3Aen
  • [RSS] Entertainment
    https://news.google.com/rss/topics/CAAqJggKIiBDQkFTRWdvSUwyMHZNREpxYW5RU0FtVnVHZ0pWVXlnQVAB?hl=en-US&gl=US&ceid=US%3Aen

You can change the language and region for topics too in the same way as we did for top stories (either using the Google News interface or by modifying the URL and changing the language and country codes).

Advanced search

Google News also lets you get news items based on your search criteria. Click the arrow at the end of the search field to open the advanced search form.

Google News advanced search
Google News advanced search

Examples:

RSS feeds for advanced search

To get RSS feeds for search results, replace ‘/search‘ in the URL with ‘/rss/search‘:

News from a particular site

What if we’re only interested in news from a particular site? If it’s indexed by Google News we can limit results to that site using the same advanced search fields covered in the previous section.

Not all blogs and news sites indexed by the main Google search engine are available in Google News.

Here’s how you’d get recent news items indexed by Google News from The Grayzone website:

Google News advanced search with Website field filled in.
Telling Google News to return news items posted on The Grayzone in the last week

This will result in the following search query: site:thegrayzone.com when:7d

The Grayzone has its own feed, so unless we want to narrow the search down further, we’d be better off using the official feed. And with Google’s hostile attitude to independent news sites we’d advise against relying on Google News feeds for non-corporate news sources whenever possible.

But there are sites indexed by Google News that don’t publish their own feeds. Using Google News’ feed output, you can get RSS feeds for these sites. Reuters is one example. You can get reuters.com news items published in the last hour using this Google News query: site:reuters.com when:1h

(For sites that don’t publish feeds and aren’t on Google News, our Feed Creator service might help.)

It’s also possible to include a path segment in the site: operator to limit results to a specific category. For example, Reuters publishes its technology news at https://www.reuters.com/technology/, so you can limit Google News results to news items from this category by using: site:reuters.com/technology

RSS feeds for specific sites

We’re still using the search endpoint here, so like before, replace ‘/search‘ in the URL with ‘/rss/search‘:

News from a number of specific sites

One useful feature of Google News is being able to combine searches. We can, for example, tell Google News to return news items from two or three specific sites in a single feed.

After you run a search, you can edit the query by hand in the search field to use additional operators.

Let’s say we want news items from two sites: The Grayzone and MintPress News published in the last 7 days, here’s the query to use:

site:thegrayzone.com OR site:mintpressnews.com when:7d

Want another site, let’s say Fairness and Accuracy in Reporting, add that with another ‘OR’:

site:thegrayzone.com OR site:mintpressnews.com OR site:fair.org when:7d
Google News search results for The Grayzone and MintPress News.

RSS feeds for specific sites

As before, we’re still using the search endpoint, so replace ‘/search‘ in the URL with ‘/rss/search‘:

So that’s how you get Google News RSS feeds. If you have any questions, feedback or suggestions, please let us know on our forum.

The post Google News RSS feeds appeared first on FiveFilters.org.

]]>
How to use proxy servers with Feed Creator and Full-Text RSS https://www.fivefilters.org/2021/proxy-servers-in-feed-creator-and-full-text-rss/ Fri, 30 Apr 2021 12:22:38 +0000 https://www.fivefilters.org/?p=3855 In this post we'll look at how to configure the self-hosted versions of our Feed Creator and Full-Text RSS software to use proxy servers.

The post How to use proxy servers with Feed Creator and Full-Text RSS appeared first on FiveFilters.org.

]]>
In an earlier post we looked at how routing requests through proxy servers could help with content retrieval for some sites. We also showed you how to enable proxy use in our Feed Control service for feeds that need it.

In this post we’ll look at how to configure the self-hosted versions of our Feed Creator and Full-Text RSS software to use proxy servers.

Storm Proxies as proxy provider

You can use any proxy service provider you like, but in this guide we’ll be using Storm Proxies, specifically its Dedicated Rotating Proxies. That’s what we use in our Feed Control service and it’s worked well for us for the feeds which have needed proxy routing. (If you encounter feeds that need a more specialised solution, you can still follow these steps but enter the details associated with whatever service you choose to use.)

With this particular service Storm Proxies requires you use IP authentication, so the IP address belonging to your server hosting our software needs to be registered in your Storm Proxies account.

Storm Proxies client area

If you’ve set up our software on a VPS server, you’ll be able to find the IP address of the VPS server in your account. That’s what you’d enter in the Authorized IPs field.

Depending on the package you purchase from Storm Proxies, you will be able to enter between 1-5 IP addresses in this field. The cheapest package ($14/month) only allows 1 IP address and 10 simultaneous connections. If you’re running our software on a single VPS and can ensure the feeds won’t all be requested simultaneously in a way that exceeds the 10 simultaneous connections limit, this package will work fine. Otherwise Storm Proxies offer bigger packages that increase the number of simultaneous connections and IP addresses.

Proxy configuration in Full-Text RSS/Feed Creator

Both Full-Text RSS and Feed Creator allow you to configure how proxy servers are used with each application, these include:

  1. Never use proxy
  2. Always use proxy
  3. Only use proxy when &proxy request parameter is used

A proxy server should only be needed in situations where a direct connection doesn’t work (see our previous post for more details), so we recommend option 3 above. To enable that mode, you’ll want to edit the config file for Full-Text RSS/Feed Creator.

Best practice for both applications is to set up a custom config file (instructions in the README.txt file distributed with each application) so that future updates to the software don’t overwrite your settings.

There are 3 configuration options in the config file that you’ll want to edit. Let’s go through them one by one.

Step 1: Enter proxy server IP and port

After you create your Storm Proxies account, you’ll be given an IP address and port number. You’ll actually be given a set of 3 IPs: Main gateway, 3-minute gateway, 15-minute gateway.

Proxy gateways provided by Storm Proxies

We recommend using the main gateway IP in our software.

$options->proxy_servers = ['stormproxies' => ['host' => 'x.x.x.x:xxxxx']];

Make sure to edit the above and replace ‘x.x.x.x:xxxxx’ with the IP address and port (number after the colon) for the main gateway displayed in your Storm Proxies account.

If you’re using a different proxy service that requires username and password for authentication, you can add that with the ‘auth’ key, e.g.: ['host'=>'x.x.x.x:xxxxx', auth'=>'user:pass']

Step 2: Disable proxy use by default

We don’t want the proxy server to be used for every request, so let’s disable it by default.

$options->proxy = false;

Step 3: Allow proxy use per request

But we do want to be able to indicate that the proxy server should be used for certain requests, so we want to enable proxy override.

$options->allow_proxy_override = true;

Using the proxy service

With the above changes, you won’t notice any difference when using Feed Creator or Full-Text RSS, and won’t see an option for choosing a proxy server from the interface of either application.

To indicate that the proxy server you entered should be used, you will need to change the feed URL generated by Full-Text RSS or Feed Creator and add &proxy=stormproxies as a request parameters.

Full-Text RSS example

.../full-text-rss/makefulltextfeed.php?url=example.org%2Ffeed&proxy=stormproxies

Feed Creator example

.../feed-creator/extract.php?url=example.org...&proxy=stormproxies

Failing requests?

When using a rotating proxy service, the results can be hit and miss because requests are routed through different servers. If something doesn’t load, try refreshing to route through a different server. Feeds are intended to be polled regularly for updates, so the occasional failing request shouldn’t be of concern.

In Full-Text RSS, however, if content cannot be retrieve (due to an extraction, connection, or proxy failure) the feed item will still be returned, but with a message [unable to retrieve full-text content]. When using a rotating proxy, you will probably want to tell Full-Text RSS to exclude items that cannot be retrieved, because a future attempt through a different server might succeed. To do that, add the parameter: &exc=1:

.../full-text-rss/makefulltextfeed.php?url=example.org%2Ffeed&proxy=stormproxies&exc=1

This can also be enabled via the Full-Text RSS interface by selecting ‘remove item from feed’ in the ‘If extraction fails’ field.

More than one proxy service?

To use different proxy services for different feeds, you can enter more in the configuration file and give each one a unique name. Then pass the name of the proxy service that should be used in the proxy request parameter. So some feeds might have &proxy=proxy1 and others &proxy=proxy2.

Here’s how the configuration might look:

$options->proxy_servers = [
  'proxy1' => ['host' => 'x.x.x.x:xxxxx'], 
  'proxy2' => ['host' => 'x.x.x.x:xxxxx', 'auth' => 'user:pass']
];

Testing the proxy service

The best way to test if the proxy service is working is by fetching a page which shows you the IP address making the request. We’ll use myexternalip.com here. If you load that now, it should show you the IP address associated with your current connection.

When using our software without a proxy service, the IP address shown from such a page will be the IP address of the server running Full-Text RSS or Feed Creator. When the same request goes through a proxy service, the IP address shown will be one connected to the proxy service, not your server.

Testing Full-Text RSS

  1. Enter the URL https://myexternalip.com/ into the URL field in Full-Text RSS and hit ‘Create Feed’
  2. You should see the IP address of the server Full-Text RSS is hosted on in the results:
    “My External IP address – [your server IP]”
  3. Edit the Full-Text RSS URL in your browser’s address bar based on the instructions in the previous section, so the final URL will look something like: .../full-text-rss/makefulltextfeed.php?url=myexternalip.com&proxy=stormproxies
  4. Now you should be shown a different IP address:
    “My External IP address – [IP associated with proxy service]”

When testing Full-Text RSS with pages that show you the IP address making the request, be aware that Full-Text RSS only returns content when it can extract what it determines is the main content element on a page. It’s designed for web articles such as news stories and blog posts, so will typically look for clues, such as a series of paragraphs, to help it identify the article body. When used on pages that aren’t structured like a text article, it often won’t find a suitable element and won’t return a result at all. You’ll instead see the message ‘[unable to retrieve full-text content]‘. At the time of writing, it is able to extract from the myexternalip.com page used in the steps above.

Some sites, such https://api.ipify.org and https://ip.seeip.org will simply return the IP as text with zero HTML. To get Full-Text RSS to display these, you should use &debug=rawhtml in the request, e.g.: .../full-text-rss/makefulltextfeed.php?url=api.ipify.org&proxy=stormproxies&debug=rawhtml

You’ll then see the HTTP response from the server, including the IP address at the bottom.

Testing Feed Creator

  1. Enter the URL https://myexternalip.com/ into the URL field in Feed Creator and hit ‘Preview’
  2. You should see the IP address of the server Feed Creator is hosted on in the results:
    “My External IP address – [your server IP]”
  3. Click the RSS Feed button to load the RSS feed in your browser
  4. Edit the URL in your browser’s address bar based on the instructions in the previous section, so the final URL will look something like: .../feed-creator/extract.php?url=myexternalip.com&proxy=stormproxies
  5. Now you should be shown a different IP address:
    “My External IP address – [IP associated with proxy service]”

If you’re using the rotating proxy service from Storm Proxies as described in this guide, the IP address you see connected to the proxy service won’t be the IP address of the gateway you entered in the configuration file. It will be different, and should change on each request (provided you’ve not enabled caching in Full-Text RSS/Feed Creator).

The post How to use proxy servers with Feed Creator and Full-Text RSS appeared first on FiveFilters.org.

]]>
Using proxy servers for content retrieval https://www.fivefilters.org/2021/proxy-server-support/ Mon, 26 Apr 2021 13:35:40 +0000 https://www.fivefilters.org/?p=3822 We've added proxy support to Feed Control in our latest update. This post will explain what it does, why you might need it, and how you can enable it.

The post Using proxy servers for content retrieval appeared first on FiveFilters.org.

]]>
We’ve added proxy support to Feed Control in our latest update. This post will explain what it does, why you might need it, and how you can enable it.

If you’re a user of our self-hosted Full-Text RSS or Feed Creator software, we’ll be covering how you can enable proxy support in those applications in the next post.

What’s a proxy server?

Proxy servers are used to route HTTP requests (e.g. requests for web pages) through different servers.

When you use our hosted applications (Feed Control, Full-Text RSS or Feed Creator) to fetch content from webpages, those requests go out from our servers in Germany (that’s where we host most of our web services). So when fetching content from example.org, the site will see that someone from Germany is requesting a web page. But it’s also possible to route the same request through a proxy server in the US, or some other country.

Why does it matter where a request originates?

Most of the time, it makes zero difference. A request from Germany will be treated exactly the same as a request from the US. There are situations, however, where it does make a difference.

Geofencing

With the introduction of GDPR in Europe, some sites in the US catering to local communities have decided it’s not worth the hassle to comply with European privacy laws when most of their audience is outside of Europe. They set up geofencing on their sites to refuse access to visitors outside of the US. When you access a site like this from Europe, you’ll often see a message stating that they cannot serve European visitors.

But what happens when someone from the US tries to use our Feed Control, Feed Creator, or Full-Text RSS service with such a site? The request will go out from one of our servers in Germany and will be rejected when it reaches the geofenced site. Regardless of where you live, when you request content via our services, all requests currently look to the target site as if they originate from Germany, because that’s where our servers are based. So certain content accessible to our users in the US won’t be accessible when requested via our services.

Rate limiting

Additionally, there are also sites that will limit the number of requests a single visitor (determined by IP address) can make within a certain timeframe. Such rate limits are usually in place for good reasons. They can prevent malicious activity or excessive requests that can put too much strain on servers. But a sometimes unintended consequence of rate limiting is that requests that would normally be handled fine if made by users directly get rejected when they come from a limited pool of IP addresses belonging to a service acting on behalf of those same users. To the site receiving these requests, it can look like a handful of users making too many requests, rather than a 100 or so users making a reasonable number of requests. You might have experienced something similar if you’ve ever used a VPN service and found yourself unable to load certain sites because of “too many requests”.

How does a proxy server help?

To access sites that enforce geofencing (mostly in our experience US sites that refuse to serve European visitors), we can route requests through US proxy servers. Now the geofenced site sees a request from the US and no longer blocks it.

To handle the rate limiting issue above, a rotating proxy service can be used to distribute requests through a number of different servers, rather than one.

Proxy use in Feed Control

If you use our Feed Control application, we now let you enable proxy use for feeds you add to your account. When enabled, Feed Control will use a rotating proxy service to route requests through different US servers when fetching web content.

The feature is currently only available for two types of feeds in Feed Control:

  • Expanded feeds (when you enable ‘fetch full text’ to have article content retrieved from the source site)
  • Webpage to RSS feeds (feeds built with out Feed Creator application and then added to Feed Control)

In most cases, there will be no need to enable proxy use, so we suggest you try without it first and only enable if you have trouble. You can also contact us via the support link if you need assistance with a feed.

Enabling proxy use in Feed Control

It’s not yet possible to preview feed output with proxy use enabled without adding the feed to your account first (we’ll add support for that in a future update). So if you suspect the content you’re after is not being retrieved because of the issues listed above, you should add your feed in Feed Control’s management console and then enable proxy use.

To do that, follow the steps below:

  1. Log in to your Feed Control account
  2. From the left sidebar select Feeds
  3. Click Add Feed
  4. Paste the feed address into the URL field and click Add Feed
  5. In the Feed Details view that loads, click the Edit button
  6. In the Proxy field, select US Rotating
  7. Click Update Feed
  8. From the actions drop down, select Refresh feed
  9. Click the Feed items tab to see if new items appear (it might take a minute or so for the feed to refresh, so try refreshing the page if you don’t see anything immediately)

We currently limit the number of feeds on which you can enable proxy use based on your plan:

  • Standard – proxy use on up to 10 feeds
  • Plus – proxy use on up to 20 feeds
  • Business – proxy use on up to 50 feeds

If you need more than this, or if you have trouble with any feeds that you’d like us to take a look at, please contact us using the support link in Feed Control.

In the next post we’ll show you how to enable proxy use in our self-hosted software: Full-Text RSS and Feed Creator. We’ll show you how to configure our applications to use the Storm Proxies service, but any other proxy provider should work too.

The post Using proxy servers for content retrieval appeared first on FiveFilters.org.

]]>
PHP 8 fixes for Feed Creator and Full-Text RSS https://www.fivefilters.org/2021/php-8-fixes-for-feed-creator-and-full-text-rss/ Sun, 07 Mar 2021 01:22:26 +0000 https://www.fivefilters.org/?p=3565 New versions of Feed Creator and Full-Text RSS are now available for self hosting. They fix problems users reported when running the software with PHP 8.

The post PHP 8 fixes for Feed Creator and Full-Text RSS appeared first on FiveFilters.org.

]]>
New versions of Feed Creator and Full-Text RSS are now available for self hosting. They fix problems users reported when running the software with PHP 8.

Feed Creator 2.2.1 and Full-Text RSS 3.9.11 can be downloaded from the customer portal.

Please let us know if you experience any issues.

Changelogs

The post PHP 8 fixes for Feed Creator and Full-Text RSS appeared first on FiveFilters.org.

]]>
Full-Text RSS 3.9.7 https://www.fivefilters.org/2021/full-text-rss-397/ Thu, 11 Feb 2021 19:00:00 +0000 https://www.fivefilters.org/2021/full-text-rss-397/ Full-Text RSS 3.9.7 is now available. Full-Text RSS is used by software developers and news enthusiasts to extract article content from news sites and blogs, and to convert RSS feeds that contain only extracts of stories into full-text feeds. Existing customers can download the latest version through our customer login.

The post Full-Text RSS 3.9.7 appeared first on FiveFilters.org.

]]>
Update – 6 March 2021: PHP 8 bug fix: Warnings produced processing some sites (fixed in ContentExtractor.php). Please download version 3.9.11 from our customer portal.

Update – 2 March 2021: Improved JSON+LD extraction and fixed warnings generated with PHP 8 on some sites. Please download version 3.9.10 from our customer portal.

Update – 26 February 2021: Improved JSON-LD handling and fixed a bug with extracted JSON-LD elements not getting cleared between item fetches on a feed. Please download version 3.9.9 from our customer portal.

Update – 13 February 2021: Some users experienced problems with version 3.9.7 fetching content when running it on servers with a slightly older version of OpenSSL. If you had trouble, please download version 3.9.8 from our customer portal to fix the issue.

Full-Text RSS version 3.9.7 is now available. Full-Text RSS is used by software developers and news enthusiasts to extract article content from news sites and blogs, and to convert RSS feeds that contain only extracts of stories into full-text feeds.

Existing customers can download the latest version through our customer portal.

What’s changed in 3.9.7?

You’ll find a full changelog at the end, but here are the main changes.

PHP 8 compatible

We tested this release with PHP 8 and removed code that was deprecated.

JSON-LD aware

JSON-LD, or JavaScript Object Notation for Linked Data, is used by publishers to embed machine-readable data about articles on their websites. It’s become common practice in recent years.

We’ve noticed cases where the data that Full-Text RSS previously looked to extract from meta tags or HTML elements inside the body are, on some sites, only found inside JSON-LD objects. As such, we’ve added code to our content extractor in this version to look inside these objects for the article title, author and date.

Automatic site config updates fixed

Automatic site config updates through the Full-Text RSS admin pages stopped working recently. We rely on Github for these and the HTTP headers we looked for had changed, so we’ve now updated our code to handle it.

SubToMe.com offers a convenient way to subscribe to a feed in your favourite feed reader. In previous versions we used it in the feed preview and in a <atom:link rel="related"...> field in the feed output. Unfortunately it hasn’t been updated since 2015, so some of its subscribe endpoints (e.g. Feedly’s) are no longer valid.

We now point users to our own subscribe page. Here’s an example: https://subscribe.fivefilters.org/?name=FiveFilters.org&url=https%3A%2F%2Fblog.fivefilters.org%2Ffeed.xml

Note: this is currently hardcoded to use subscribe.fivefilters.org, we plan to offer this as part of the Full-Text RSS package in the future.

Installing on a VPS

If you’d like to try the new version out on a VPS, we can recommend Hetzner Cloud (free €20 credit with link if you’re new to Hetzner). Please see our installation instructions.

Feedback

Please let us know if you have any trouble using this new version.

Changelog

Full-Text RSS 3.9.10 (2021-03-02)

  • Improve JSON+LD extraction and fix warnings generated with PHP 8 on some sites
  • Minor improvements and API parameter description updates

Full-Text RSS 3.9.9 (2021-02-26)

  • Better handling of JSON+LD elements
  • Bug fix: JSON+LD extracted elements not cleared between item fetches on a feed

Full-Text RSS 3.9.8 (2021-02-13)

  • Bug fix: Full-Text RSS failed to fetch content on some servers with older versions of OpenSSL.

Full-Text RSS 3.9.7 (2021-02-11)

  • Bug fix: Item titles (and feed title and description) no longer double-encoded when they contain characters that need encoding
  • Bug fix: Automatic site config updates (if configured via admin page) stopped working due to Github changes
  • Bug fix: Query string param ‘&images=0’ to remove images from output should now work correctly
  • Bug fix: Proxy servers (with auth) and PECL HTTP extension should now work correctly if configured in config file
  • Look inside JSON+LD elements to extract title, author, date (use ‘skip_json_ld: yes’ in site config file to disable)
  • Compatible with PHP 8
  • HTML5-PHP library updated to version 2.7.4
  • SimplePie library updated to version 1.5.6
  • Change subtome.com URLs in the <atom:link rel="related" href="[url]"> attribute to subscribe.fivefilters.org (uses more recent feed reader subscribe endpoints)
  • Minor fixes

The post Full-Text RSS 3.9.7 appeared first on FiveFilters.org.

]]>
Feed Control, Full-Text RSS, Feed Creator: Which to choose? https://www.fivefilters.org/2020/feed-control-comparison/ Mon, 28 Dec 2020 09:31:00 +0000 https://www.fivefilters.org/2020/feed-control-comparison/ We now offer a few products for working with feeds. This post looks at the the differences between them to help you choose the right one.

The post Feed Control, Full-Text RSS, Feed Creator: Which to choose? appeared first on FiveFilters.org.

]]>
Since launching Feed Control, some of you have asked how it compares to Full-Text RSS and Feed Creator. This post will try to answer that.

TLDR: If you’re not a developer, and have no need to run our tools on your own server, choose Feed Control. If you’re a developer, read on.

Our feed products are used in combination with feed reading applications such as Feedly, Newsblur, Fraidycat, and many others, but also by developers who need custom integrations with their own applications and workflows, usually in relation to monitoring and extracting information from blogs and news publications.

We’ll be taking both types of use into account when comparing the products, but because the solutions differ more when evaluated from a developer perspective, we’ll try to focus more on that angle when comparing. If you’re not a developer, you can ignore the parts aimed at developers.

Full-Text RSS

Full-Text RSS is our feed expansion application. It takes a partial feed (e.g. a feed which only contains a short summary of each article) and converts it to a full-text feed by pulling in the full article content for each item.

If you enjoy reading full articles within your news reading application and not having to click into the site itself, Full-Text RSS can help.

If you’re a developer, Full-Text RSS can also be used to extract article content from individual articles. Instead of giving it a feed URL, give it an article URL and it will try to extract the article content and return it along with additional information that might be useful (e.g. language, author).

Full-Text RSS can be used as a hosted service run by us, or bought to run yourself on your own server. We also offer the service via RapidAPI for developers, which is a great way to get up and running integrating it into your own application.

Feed Creator

Feed Creator is our feed creation application. It has two main uses:

  1. Creating feeds from web pages which don’t offer their own
  2. Filtering or merging existing feeds

To create a feed from a web page, you give it the web page URL and some selectors for the content you want extracted. When filtering or merging existing feeds, you give it feed URLs and keywords or URL segments to use as filters.

Feed Creator can be used as a hosted service run by us, or bought to run yourself on your own server.

Feed Control

Feed Control builds on both of the above. You can use it to create full-text feeds and also generate feeds from a web page.

Full-Text RSS and Feed Creator are more bare-bones compared to Feed Control. They have been developed to be fast, efficient, and stateless, with a small server footprint. For developers, they are ideally used as microservices, and can easily be set up on a server and scaled.

While that approach has benefits, keeping those applications small and lean – which we’re committed to doing – does also limit us in what we can implement as part of each service. Feed Control is our effort in making both those tools a little more accessible and adding features many of our users have requested over the years.

Below, we’re going to look at some of the features of Feed Control that are currently not available in Full-Text RSS and Feed Creator. (We have plans for more, which we’ll be covering here in due time.)

Note: Feed Control is currently only available as a hosted service. We have no plans yet to offer a self-hosted version.

User interface for managing feeds

Feed Control

Create an account to store your feeds. View and manage feeds from the admin interface.

Full-Text RSS / Feed Creator

There is no record of the feeds you’ve created.

Faster feed access via a CDN

Feed Control

When you enable RSS or JSON generation for a feed, the resulting file is stored on a content delivery network (CDN) for fast access. This is automatically updated as new items for the feed are pulled in.

Full-Text RSS / Feed Creator

Full-Text RSS and Feed Creator process and generate feeds on an ad-hoc basis, as requests come in (with some caching to increase performance). This approach can result in delays when returning content, especially if the source feed is on a server that’s slow to respond.

Twitter feeds

Feed Control

Monitor and generate feeds from a user’s Twitter timeline. With or without retweets.

Full-Text RSS / Feed Creator

Not available.

Email alerts

Feed Control

Enable email alerts for a feed to receive a notification for each new item detected, or a daily, weekly, or monthly summary.

Full-Text RSS / Feed Creator

Not available.

Webhooks

Webhooks are intended for developers.

Feed Control

Enable webhooks for a feed, and Feed Control will send your application data for each new item. It will send the original HTML; a stripped-down, sanitized version; and plain text.

It’s a great alternative to polling feeds for updates in your application, and also a nice solution for serverless setups. Read more about webhooks in our documentation, including basic code examples.

Full-Text RSS / Feed Creator

Not available.

Generate stripped-down HTML

Feed Control

Feed items that contain a lot of HTML with non-essential elements or styling can be stripped-down further to improve display. When enabling feed generation in Feed Control, you have the option of enabling this.

Full-Text RSS / Feed Creator

Not available.

Filtering

Feed Control

Set up filters to only include items of interest, e.g. ignore a tweet unless it mentions ‘covid’, or alternatively, ignore tweets that do mention ‘covid’. You can filter on the item URL too, e.g. ignoring items unless they contain the segment ‘/news/’ in the URL.

Full-Text RSS / Feed Creator

Filtering is not available in Full-Text RSS, but is available in Feed Creator.

Try for free

We think Feed Control is the best place to start for most users, but you can try all three for free:

Feedback or questions?

If you have any questions or feedback about our tools, or are unsure which one’s right for you, feel free to get in touch on our forum, by email, or Twitter.

The post Feed Control, Full-Text RSS, Feed Creator: Which to choose? appeared first on FiveFilters.org.

]]>
Full-Text RSS 3.9.6: PHP 7.4 compatible https://www.fivefilters.org/2020/full-text-rss-396/ Sat, 25 Apr 2020 07:49:00 +0000 https://www.fivefilters.org/2020/full-text-rss-396/ Full-Text RSS 3.9.6 is now available. Full-Text RSS is used by software developers and news enthusiasts to extract article content from news sites and blogs, and to convert RSS feeds that contain only extracts of stories to full-text feeds. Existing customers can download the latest version through our customer login.

The post Full-Text RSS 3.9.6: PHP 7.4 compatible appeared first on FiveFilters.org.

]]>
Full-Text RSS version 3.9.6 is now available. Full-Text RSS is used by software developers and news enthusiasts to extract article content from news sites and blogs, and to convert RSS feeds that contain only extracts of stories to full-text feeds.

Existing customers can download the latest version through our customer login.

What’s changed in 3.9.6?

This is mainly a maintenance release. The new version is now compatible with PHP 7.4. You’ll find a full changelog at the end, but here are the main changes.

PHP 7.4 compatible

We removed code that was deprecated in PHP 7.4 and tested this release with PHP 7.4.

Server initialisation script for Ubuntu 20.04

Ubuntu 20.04 was released a few days ago. In this release we’ve added a server initialisation script for this version of Ubuntu. It’s a Puppet file called ubuntu-20.04.pp. The file should be run on a new instance of Ubuntu 20.04 and will install PHP 7.4, Apache, and all the required components for Full-Text RSS 3.9.6.

The easiest way to get Full-Text RSS set up on a new server with Ubuntu 20.04 is to log in to our customer portal and use the ‘Easy Install’ option next to Full-Text RSS.

If you’d like a somewhat more manual approach, you can initialise a new (please don’t run this on an existing server) VPS instance running Ubuntu 20.04 with the following commands:

apt-get -y install puppet
wget https://bitbucket.org/fivefilters/hosting/raw/master/ubuntu-20.04.pp
puppet apply ubuntu-20.04.pp

Full-Text RSS can now be extracted into the /var/www/html/ folder.

Full changelog

  • Compatible with PHP 7.4
  • New Puppet server initialisation script for Ubuntu 20.04 (ubuntu-20.04.pp)
  • HTML5-PHP library updated to version 2.7
  • SimplePie library updated to version 1.5.4
  • Minimum PHP version is now 5.6
  • Minor fixes

The post Full-Text RSS 3.9.6: PHP 7.4 compatible appeared first on FiveFilters.org.

]]>
Full-Text RSS 3.9.5 https://www.fivefilters.org/2019/full-text-rss-395/ Fri, 29 Mar 2019 12:30:00 +0000 https://www.fivefilters.org/2019/full-text-rss-395/ Full-Text RSS 3.9.5 is now available. Full-Text RSS is used by software developers and news enthusiasts to extract article content from news sites and blogs, and to convert RSS feeds that contain only extracts of stories to full-text feeds. Existing customers can download the latest version through our customer login.

The post Full-Text RSS 3.9.5 appeared first on FiveFilters.org.

]]>
Full-Text RSS version 3.9.5 is now available. Full-Text RSS is used by software developers and news enthusiasts to extract article content from news sites and blogs, and to convert RSS feeds that contain only extracts of stories to full-text feeds.

Existing customers can download the latest version through our customer login.

What’s changed in 3.9.5?

You’ll find a full changelog at the end, but here are the main changes.

HTML parsing and character encoding

The main change in this release affects users of our hosted service and also self-hosted users who ran our server initialisation script to set up a new server. We now only parse with HTML5-PHP rather than Gumbo PHP. We’ve done this because in some situations the latter produced results where certain characters were double-encoded. We hope to fix this for future releases.

If the character encoding issue didn’t affect you and you want to continue with Gumbo parsing, you will have to edit your config file (look for $options->allowed_parsers and $options->default_parser).

PHP 7.3 compatible

We removed code that was deprecated in PHP 7.3 and tested this release with PHP 7.3.

Feed preview in Firefox

Firefox users might have noticed that the feeds we produced did not display well in the browser. We use XSL and CSS stylesheets to get the browser to render feeds more nicely, rather than simply display the raw XML. Firefox does not support the disable-output-escaping XSL attribute we relied on, so we’re using Javascript code by Sean M. Burke to handle this. This change does not affect the actual RSS, only how it’s presented in certain browser like Firefox.

Because of the EU General Data Protection Regulation (GDPR), some sites have put up cookie notices on their pages notifying you of their use of cookies and asking you to accept. Some sites go even further and put up cookie walls in front of all EU visitors. These are warnings displayed to visitors prompting them to accept tracking of some sort before they can start reading the content they want to read. Some sites go even further and flat out refuse to load content for EU visitors.

If you install Full-Text RSS on a server in an EU country, Full-Text RSS will also be treated as an EU visitor. Because Full-Text RSS acts as your proxy here and has no intention of tracking you or helping other sites track you, we have rules to pass through many cookie walls and give you the content you’re after. This approach works well, but is limited in that it requires knowledge of how a site has implemented its cookie wall. As such, you may still encounter sites that won’t work when Full-Text RSS is installed on an EU server.

So, as much as we think the GDPR is a good move, to make sure Full-Text RSS is able to work with the widest range of sites, our suggestion is to install Full-Text RSS on a server outside the EU, or to configure Full-Text RSS to use a non-EU proxy. We’ll be updating our hosting page with recommendations soon, as well as moving our hosted Full-Text RSS instances outside of the EU.

Full changelog

  • Bug fix: Character encoding issues (now using bundled HTML5-PHP parser by default)
  • Bug fix: RSS preview broken in Firefox (preview stylesheet updated)
  • Bug fix: Google Alert feeds not producing results (meta refresh handling updated)
  • Bug fix: srcset relative URLs now rewritten to absolute form (in line with img src)
  • Bug fix: disable XSS filtering in extract.php when &xss=0
  • Updated default User-Agent strings used
  • HTML5-PHP library updated to version 2.6
  • Updated server setup script (ubuntu-18.04.pp) to use newer versions of PECL HTTP and APCu
  • Deprecated $options->allowed_urls in favour of $options->allowed_hosts in config.php
  • Removed deprecated filter_var flags for PHP 7.3 compatibility
  • Tested with PHP 7.3

The post Full-Text RSS 3.9.5 appeared first on FiveFilters.org.

]]>
Full-Text RSS 3.9 https://www.fivefilters.org/2018/full-text-rss-39/ Sun, 06 May 2018 21:44:28 +0000 https://www.fivefilters.org/2018/full-text-rss-39/ Full-Text RSS 3.9 is now available. Full-Text RSS is used by software developers and news enthusiasts to extract article content from news sites and blogs, and to convert RSS feeds that contain only extracts of stories to full-text feeds. Existing customers can download the latest version through our customer login.

The post Full-Text RSS 3.9 appeared first on FiveFilters.org.

]]>
[See update at bottom about 3.9.1]

Full-Text RSS 3.9 is now available. Full-Text RSS is used by software developers and news enthusiasts to extract article content from news sites and blogs, and to convert RSS feeds that contain only extracts of stories to full-text feeds. Existing customers can download the latest version through our customer login.

Command-line use

It’s now possible to use Robo Task Runner to initialise Full-Text RSS and carry out common operations. After installing Robo, change into the Full-Text RSS folder and try the following:

  • Initialise: enable caching and site config updates

    robo init

  • Update site config files

    robo update:site-config-files

  • Get article title

    robo title [url]

  • Get article contents as plain text

    robo text [url]

  • Get article contents as HTML

    robo html [url]

Note: Full-Text RSS still needs to be set up as before (as a web service) for these to work.

Plain text output

It’s now possible to request plain text by passing &content=text as a request parameter. The output format will not change when passing this, but where we had article HTML before, we’ll now have plain text.

Easier installation for customers

We also now offer customers a way to install Full-Text RSS by running a Cloud-Init script when setting up a new VPS instance. If you’d like to make use of it, log in to our member page after buying Full-Text RSS and you’ll find instructions there.

The steps will initialise a new Ubuntu 18.04 server with Apache and PHP 7.2, install Full-Text RSS, enable caching, update the site config files and set up cron to update them again periodically.

Full changelog

  • Convert extracted HTML content to plain text with &content=text
  • Bug fix: character encoding issue when using Gumbo as parser
  • Removed deprecated PHP 7.2 calls (thanks Florian)
  • Added a few basic Codeception tests (see tests folder)
  • Added RoboFile.php to allow use from the command line (with Robo Task Runner)
  • Removed $options->html5_output from config file (it’s now always used)
  • Other fixes/improvements

Available to try and buy

Full-Text RSS 3.9 is now available to buy. If you’re an existing customer, you can download the latest version from our member page or upgrade at a discount.

You can also test the software before buying. This test copy will only be up until 21 May 2018. After that you can test using our free, hosted version (some features disabled).

Update May 11 2018

We released Full-Text RSS 3.9.1 yesterday to address a few issues with the 3.9 release. Changelog for that entry below:

  • Bug fix: Removed deprecated each() calls (caused ‘deprecated’ PHP warnings in some versions of PHP)
  • Bug fix in Gumbo PHP parser: Preserve whitespace in <pre> elements wrapping <code> elements (change in ubuntu-18.04.pp affecting those who setup their server with the previous Puppet file, or via our customer Easy Install page)
  • Replaced ubuntu-16.04.pp Puppet file with ubuntu-18.04.pp
  • Specify text width in plain text output: &content=text80 will wrap at 80 characters (default is 70)

If you installed 3.9, we recommend you re-install if any of the following apply to you:

  • You used our ‘Easy Install’ steps on our member pages or used our server initialisation Puppet file to set up your server. For 3.9.1 we updated the Gumbo PHP extension to handle whitespace better in <pre> elements. If this applies to you, you will need to re-run our Easy Install setup or apply our Puppet file to initialise a new server. (The scripts will not help if applied to an already configured server.)

  • If you get PHP 7.2 deprecation warnings when you try to create a feed (or an XML error).

  • You use our &content=text output and want to be able to specify text width: &content=text80 to wrap at 80 characters rather than 70 (default). Or &content=text0 to disable text wrapping.

The post Full-Text RSS 3.9 appeared first on FiveFilters.org.

]]>
Full-Text RSS 3.8 https://www.fivefilters.org/2017/full-text-rss-38/ Mon, 25 Sep 2017 01:49:06 +0000 https://www.fivefilters.org/2017/full-text-rss-38/ Full-Text RSS 3.8 is now available. Full-Text RSS is used by software developers and news enthusiasts to extract article content from news sites and blogs, and to convert RSS feeds that contain only extracts of stories to full-text feeds. This is mostly a maintenance release, with a few new additions. Existing customers can download the latest version through our customer login.

The post Full-Text RSS 3.8 appeared first on FiveFilters.org.

]]>
Full-Text RSS 3.8 is now available. Full-Text RSS is used by software developers and news enthusiasts to extract article content from news sites and blogs, and to convert RSS feeds that contain only extracts of stories to full-text feeds. This is mostly a maintenance release, with a few new additions. Existing customers can download the latest version through our customer login.

New site config options

Site config files are used if additional rules are required to extract a site’s content properly. Here’s an example.

This update adds two new directives that can be used in these files:

strip_attr: XPath

Remove attributes from elements. Example:

strip: //img/@srcset

insert_detected_image: yes|no

If the extracted content contains no images, we’ll look for the og:image element and insert that image into the content block. This is on by default. On sites where this image is not useful (not related to the content), this directive can be used to turn off the feature. Example:

insert_detected_image: no

PHP compatibility

This version has been tested with PHP 7.2 RC1. The minimum version of PHP required is now 5.4.

Full changelog

  • New site config directive: strip_attr: XPath attribute selector (e.g. //img/@srcset) – remove attribute from element
  • New site config directive: insert_detected_image: yes/no (default yes) – places image in og:image in the body if no other images extracted
  • Bug fix: Better handling of Internationalized Domain Names (IDNs)
  • Bug fix: Relative base URLs (<base>) now resolved against page URL
  • Bug fix: Wrong site config file chosen in certain cases (when wildcard and exact subdomain files available and cached in APCu)
  • Bug fix: &apos; HTML entities not converted correctly when parsing with Gumbo PHP
  • Remove srcset (+ sizes) attributes on img elements if it looks like they only contain relative URLs (browser will use src attribute value instead)
  • https:// URLs now re-written to sec:// before being submitted to avoid overzealous security software blocking request on some servers – no redirect, only affects newly submitted URLs on index.php
  • HTML5-PHP library updated
  • Language Detect library updated
  • Site config files updated for better extraction
  • Minimum PHP version is now 5.4. If you must use PHP 5.3, please stick with Full-Text RSS 3.7
  • Tested with PHP 7.2
  • Other fixes/improvements

Available to try and buy

Full-Text RSS 3.8 is now available to buy. If you’re an existing customer, you can download the latest version from our member page or upgrade at a discount.

You can also test the software before buying. This test copy will only be up until 10 October 2017. After that you can test using our free, hosted version (some features disabled) or contact us to get access to a regular installation of the software.

The post Full-Text RSS 3.8 appeared first on FiveFilters.org.

]]>