Doorgaan naar hoofdcontent

Vivino: are these Ratings fair?


 
This is a post about an exploratory data analysis with Python on data from Vivino. The data that is used from Vivino contains 13081 wines. The data on the name, year, region, country it is from, winery, Rating, Number of Ratings, Price, and Year is used from Vivino. The types of wines in the dataaset are Red, White, Rose and Sparkling. The price of the wines ranges from 3.55 euros to 1599 euros. Wines are rated on a scale of 1 - 5. Vivino only published wines that have at least 25 ratings. 

Motivation behind this project focuses on getting an answer to the following questions:

- What is the relationship between the rating of wine and the number of ratings?

- What is the relationship between the price of wine and the number of Ratings?

- What is the relationship between the price of wine and the rating?

Questions 1: What is the relationship between the Rating of Wine and the Number of Ratings?

As said before, the wines can have a rating between 0 and 5. However, in our dataset, the lowest given rating is 2.5 and the highest rating is 5.0.

In the scatter above, it is noticeable that the lower ratings (approximately < 3.3) have a lower number of ratings. Also, ratings above 4.7 have a low number of ratings as well. The mean rating is 3.87 and standard deviation 0.3. It can be seen in the plot that ratings around this mean value  (+/- 1 standard deviation) also have the highest number of Ratings. 

Furthermore, you see that both of the above scatters are plotted by a group. The first graph shows the scatter by wine type and the second graph shows the scatter by continent. It is very clear that that wines with a higher number of ratings are most often Red wines, and after that would be white wines. Also, most wines with a higher rating are from either Europe or America.

I have looked into the company Vivino, and this company originates from Scandinavia, but currently has headquarters San Fransisco, United States. Is it possible that due to this, Vivino has more European and American users and therefore, those are the wines rated most often because they are the easiest accessible in these continents? Or does this data suggest that Europe and America have the most wineries? 

An explanation for the fact that Red and White wines often have a high number of Ratings and Sparkling and Rose wines do not, is because Sparkling and Rose wines are more seasonal or occasional. It is interesting to further investigate the reason why Red wines are rated a lot of times on the wine app Vivino.

See the next post for a look at the answer of the second and third question!



References: 

Kaggle dataset: https://www.kaggle.com/budnyak/wine-rating-and-price







Reacties

Populaire posts van deze blog

Are expensive wines more exclusive?

Questions 2: What is the relationship between the Price of wine and the Number of Ratings? From the above plot, it is very clear that wines that are very expensive are not rated often. Also, the wines with the highest number of ratings, are often cheap wines. I wonder if there is an upward bias because people are more likely to buy wines that are already rated often, and therefore the number of ratings increase even further.  Now I wonder, do people associate a high price with a high quality of wine? Read the next post to find out!

Do people give more expensive wines a higher rating?

Questions 3: What is the relationship between the Price and Rating? I wanted to explore if people are more likely to give higher priced wines a higher rating. I expect this because people often associate quality with price. In the plot below, you can see that this is a valid expectation for our data. The highest priced wines all also have a higher rating. Also, it is seen that often the expensive wines with high ratings are Red wines.