Employing the latest technology against unwanted online bots is a more cost effective way to tackle screen scraping than trying to fight them in the courts, according to one software provider.
San Francisco-based Distil Networks estimates that on average 23% of traffic comes from ‘bad bots’, automated programmes that are scraping data for all sorts of reasons including competitor analysis.
Its solution analyses 40 types of information on order to identify malicious or bad bots, or automated programmes taking data that are unwelcome and cause sites performance to be poor.
In the UK, the forensic expert used by Ryanair has set up Data Portcullis, a similar technology solution to combat scraping.
Ryanair is itself continuing to pursue screen scrapers through European courts and has won the right to have its next case heard in its home city of Dublin.
During a workshop at last week’s Phocuswright conference in Florida the firm showed how one customer, Canada-based Red Tag Vacations was able to significantly increase its site’s up-time.
Red Tag, Canada’s largest independent tour operator, said that a pilot found the firm was getting 800,000 bot requests a day.
Orion Cassetto, director of product marketing at Distil Networks, said hackers are writing programmes to operate at scale.
“What we do is compare a whole bunch of data and we have device finger printing which allows us to track hackers across multiple IPs and correlate that.
“We also have a known violators database and finally we do machine learning and behavioural analysis to understand what is normal behaviour for your traffic and find anomalous behaviour.
“Using technology is definitely going to be cheaper than litigation and probably cheaper than a home-grown solution having someone combing through web logs.”
Rob Gennaro, digital marketing officer at Red Tag Vacations, said the firm did not realise the scale of the problem until it started working with Distil Networks.
“On certain peak days we were getting 800,000 bot requests a day. We were always getting time out and server errors. Our engineers were trying to figure out what was wrong with our database.
“It was all non-human traffic. The more our API got hit, the more our look-to-book ratios go up and all of a sudden we are paying more for our traffic.”
Gennaro said it was now able to pinpoint the source of the problem and who is scraping information and for what reason.He said this allows them to single out certain bots that they would be happy to scrape their information and serve them with a page saying they are welcome to it if they licence their API.
Gennaro said before blocking screenscrapers it was possible for a firm to take all of its licenced TripAdvisor reviews which costs Red Tag $100,000 a year for nothing.
Cassetto warned there are also increasing numbers of malicious bots looking to crack customer passwords so they can access personal information.
He said conventional security measures are only looking for a security attack on a vulnerable area of website coding rather than mass abuse of websites that may cause competitive issues.