This is a post about an exploratory data analysis with Python on data from Vivino. The data that is used from Vivino contains 13081 wines. The data on the name, year, region, country it is from, winery, Rating, Number of Ratings, Price, and Year is used from Vivino. The types of wines in the dataaset are Red, White, Rose and Sparkling. The price of the wines ranges from 3.55 euros to 1599 euros. Wines are rated on a scale of 1 - 5. Vivino only published wines that have at least 25 ratings.
Motivation behind this project focuses on getting an answer to the following questions:
- What is the relationship between the rating of wine and the number of ratings?
- What is the relationship between the price of wine and the number of Ratings?
- What is the relationship between the price of wine and the rating?
Questions 1: What is the relationship between the Rating of Wine and the Number of Ratings?
As said before, the wines can have a rating between 0 and 5. However, in our dataset, the lowest given rating is 2.5 and the highest rating is 5.0.

I have looked into the company Vivino, and this company originates from Scandinavia, but currently has headquarters San Fransisco, United States. Is it possible that due to this, Vivino has more European and American users and therefore, those are the wines rated most often because they are the easiest accessible in these continents? Or does this data suggest that Europe and America have the most wineries?
An explanation for the fact that Red and White wines often have a high number of Ratings and Sparkling and Rose wines do not, is because Sparkling and Rose wines are more seasonal or occasional. It is interesting to further investigate the reason why Red wines are rated a lot of times on the wine app Vivino.
See the next post for a look at the answer of the second and third question!


Reacties