In this project an analysis of different Airbnb data available will be carried out to get an overview of the touristic trends in the city of Seville
The following libraries are needed to run the notebook:
- numpy
- pandas
- itertools
- matplotlib
- seaborn
- datetime
- wordcloud
- PIL
- string
- nltk
- word_tokenize
- stopwords
- RegexpTokenizer
- os
Here will be done the analysis of the data to answer the following questions:
- Is Seville a popular city the whole year? How price and availability change?
- How has changed the market through the years? Did the pandemic has an impact?
- Is there a relationship between neighbourhood and price?
- Which words are the most used by tourists when doing a review?
This is the Airbnb data that will be used for the analysis:
- (
listings.csv): Detailed data of the accomodations available in the city - (
reviews.csv): Reviews of guests - (
calendar.csv): Data with availability and prices - (
neighbourhoods.csv): List with the neighbourhoods of Sevilla
The answers to the questions on the notebook were:
-
The popularity of Sevilla is approximately constant along the year with 4 peaks in April, May, September and October. The price is constant and there is more availability during the first semester of the year

-
Up to 2019 the growth was constant and continuous. However in 2020, when the pandemic started, the number of stays in Seville sharply decreased.

-
The overall average price and the average price of Casco Antiguo neighbourhood are very closed as expected since the amount of apartments in this zone represents around the 70% of all the offer available in the city

-
In general ‘great stay’, ‘great host’ or ‘highly recommend’ and for Casco Antiguo neighbourhood the most common words are ‘walking distance’, ‘bien situé’ or ‘bien ubicado’
