Challenge: A client requested to scrape hotel/host data descriptions as well as availability dates, pricing, conditions, accommodations with accompanying photo images from the adverts for monitoring and business intelligence needs .
Solution: Our team created 6 separate CSV feed files with data reporting that the client requested:
1) Hotel listings data fields: place/hotel/room descriptions, locations, titles, IDs, URLs, coordinates;
For coordinates – we additionally enabled our GeoLocation feature to help parse this data properly.
2) Host feed data: data related to place/hotel owners;
3) Pricing feed: data related to apartment booking rates and periods;
Additionally, our team cleaned up lots of numerical data to deal with different variations of rates and time periods (per night, per week, per month, etc.);
4) Photos: links to images;
5) Review feed: Data on reviews of apartments/places posted by users This part was challenging, but we managed to extract raw data and then added every single review to the feed as a replicated record;
6) Calendar data feed.
Results: We got a lot of raw data, but we also got additional data from different sources (URLs or frames). We then created a specific script that parsed, formatted, and combined all the data into a user-readable form.
For example, we had calendar