Tag Archives: WebScraper

WebScraper 4.15.6

Name: WebScraper_4.15.6__HCiSO_Mactorrents.biz.dmg
Size: 4.91 MB
Files WebScraper_4.15.6__HCiSO_Mactorrents.biz.dmg[4.91 MB]

   Download Torrent

WebScraper uses the Integrity v8 engine to quickly scan a website, and can output extracted data (currently) as CSV or JSON. Plus download images to a folder.

  • Easy to scan a site – just enter the starting URL and press “Go”
  • Easy to export – choose the columns you want
  • Plenty of extraction options, including HTML elements with certain classes or IDs, regular expressions, or entire content in a number of formats (html, plain text, markdown)
  • Since v4.1 can download to a folder all images discovered
  • Configuration of various limits on the crawl and the output file size
  • More…

Compatibility: macOS 10.13 or later • Apple Silicon or Intel Core processor
Homepage https://peacockmedia.software/mac/webscraper/

Screenshots


WebScraper 4.10.2

WebScraper 4.10.2

WebScraper uses the Integrity v8 engine to quickly scan a website, and can output extracted data (currently) as CSV or JSON. Plus download images to a folder.

  • Easy to scan a site – just enter the starting URL and press “Go”
  • Easy to export – choose the columns you want
  • Plenty of extraction options, including HTML elements with certain classes or IDs, regular expressions, or entire content in a number of formats (html, plain text, markdown)
  • Since v4.1 can download to a folder all images discovered
  • Configuration of various limits on the crawl and the output file size
  • More…

What’s New:

Version 4.10.2:

  • Small fix to the ‘split into multiple files if output exceeds 64,000 rows’ option
  • Splitting files at 64000 rows now takes account of the header row

Compatibility: OS X 10.8 or later, 64-bit processor
Homepage https://peacockmedia.software/mac/webscraper/

Screenshots


Name: WebScraper_4.10.2__TNT_Mactorrents.io.dmg
Size: 6.4 MB
Files WebScraper_4.10.2__TNT_Mactorrents.io.dmg[6.4 MB]

   Download

WebScraper 4.2.1

WebScraper 4.2.1

Name: WebScraper Mac
Version: 4.2.1
Developer: Shiela Dixon
Mac Platform: Intel
OS Version: OS X 10.8 or later
Processor type(s) & speed: 64-bit processor

Includes: Pre-K’ed (TNT)

Web Site: http://peacockmedia.co.uk/

Overview

Quickly extract information related to a certain webpage, including the text content, by using a minimalist app that exports the data to JSON or CSV

WebScraper offers you the possibility to quickly extract content from an online source with minimal effort. You have full control over the data that will be exported to the CSV or JSON files.

Quickly scan any website by using multiple threads

Within the WebScraper main window you must specify the URL address of the webpage you want to scan, and the number of threads that are to be used to complete the procedure. You get to adjust the latter parameter with the help of a simple slider bar.

To avoid any unnecessary scanning, you can choose to crawl only a single page, and then start the process with a simple mouse click. In the Live View window, you get to see the status message returned by each link, which might prove useful when dealing with debugging tasks.

Extract various types of information and export the data to CSV or JSON

In the WebScraper Output panel, you get to choose the type of information you want the utility to extract from a web page: the URL, the title, the description, content associated with a different class or ID, the headings, the page content in various formats (plain text, HTML or Markdown) and the last modified date.

You also get to choose the output file format (CSV or JSON), decide to consolidate white spaces, and set an alert if the file exceeds a certain size. If you are opting for the CSV format, you get to pick when to use quotes around columns, what to adopt instead of quotes, or the line separator type.

Last but not least, WebScraper also allows you to change the user-agent, to set a limit for the number of links and the clicks from home, can ignore query strings, and may treat subdomains of root domain as internal pages.

Effortlessly crawl information from online sources without too much user interaction

WebScraper offers you the possibility to quickly scan websites and output their content, together with other additional medatada, to CSV of JSON files. The tool is great whenever you want to have offline access top the data without having to store the entire page.

What’s new in WebScraper 4.2.1

  • Adds option to ignore and / when extracting content as plain text / markdown (defaults for a new project are to *include* the contents of the nav, header and footer)
  • Fixes bug in engine relating to the ‘output filter’ (only scraping data from pages containing or not containing X)
  • Slight change to the way that ‘ignore urls containing’ under scan works. It wasn’t ignoring these ulrs completely, but ‘not following’. Practically this probably makes little difference but the operation more accurately matches the wording now, and may make the scan more efficient.
  • too much information was being sent to the console in recent version(s), this is tidied up a bit.

Screenshots

Name WebScraper 4 2 1 TNT.zip
Size 5.35 MB
Created on 2018-05-08 03:14:59
Hash c3e801360f8a7059c5dccbf699802bef6e908dde
Files WebScraper 4 2 1 TNT.zip (5.35 MB)

WebScraper 4.1.0

Descriptions for WebScraper 4.1.0

Name: WebScraper
Version: 4.1.0
Developer: Shiela Dixon
Mac Platform: Intel
OS Version: OS X 10.8 or later
Processor type(s) & speed: 64-bit processor

Includes: Pre-K’ed (TNT)

Web Site: http://peacockmedia.co.uk/

Overview

Quickly extract information related to a certain webpage, including the text content, by using a minimalist app that exports the data to JSON or CSV

WebScraper offers you the possibility to quickly extract content from an online source with minimal effort. You have full control over the data that will be exported to the CSV or JSON files.

Quickly scan any website by using multiple threads

Within the WebScraper main window you must specify the URL address of the webpage you want to scan, and the number of threads that are to be used to complete the procedure. You get to adjust the latter parameter with the help of a simple slider bar.

To avoid any unnecessary scanning, you can choose to crawl only a single page, and then start the process with a simple mouse click. In the Live View window, you get to see the status message returned by each link, which might prove useful when dealing with debugging tasks.

Extract various types of information and export the data to CSV or JSON

In the WebScraper Output panel, you get to choose the type of information you want the utility to extract from a web page: the URL, the title, the description, content associated with a different class or ID, the headings, the page content in various formats (plain text, HTML or Markdown) and the last modified date.

You also get to choose the output file format (CSV or JSON), decide to consolidate white spaces, and set an alert if the file exceeds a certain size. If you are opting for the CSV format, you get to pick when to use quotes around columns, what to adopt instead of quotes, or the line separator type.

Last but not least, WebScraper also allows you to change the user-agent, to set a limit for the number of links and the clicks from home, can ignore query strings, and may treat subdomains of root domain as internal pages.

Effortlessly crawl information from online sources without too much user interaction

WebScraper offers you the possibility to quickly scan websites and output their content, together with other additional medatada, to CSV of JSON files. The tool is great whenever you want to have offline access top the data without having to store the entire page.

What’s new in WebScraper 4.1.0

April 19th, 2018

  • Adds capability of downloading images to a folder during the scan. See Complex setup > Output file columns > Also download images to folder. Images can optionally be downloaded only if they match a pattern, either partial url or regex match. (leave box blank to download all images discovered)
  • Adds option to filter output file – ie only include data in output file from certain pages (eg information pages or product pages). This is done by matching the url of the page being scraped, either by partial url (eg /product/) or a regex match
  • Fixes issue with saving project. (note that saving project does not save data, only settings and configuration. Save data separately using Export from the Results screen or File > Export)

Screenshots

Name WebScraper 4 1 0 TNT.zip
Size 5.25 MB
Created on 2018-04-20 18:39:28
Hash f2752e3780453227e54ca5ec2eaa0aa105a4491b
Files WebScraper 4 1 0 TNT.zip (5.25 MB)