Chapter 2 Data sources

Eric is responsible for collecting the data.

Fortunately, the dataset, related to Portuguese red Vinho Verde wine samples, is publicly available on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/wine+quality. Notice that, due to privacy and logistic issues, we have no access to wine brands, selling price, etc. For more information, see the reference [Cortez et al., 2009] http://www3.dsi.uminho.pt/pcortez/wine/.

2.1 Attributes and outcome

In total, we have 11 attributes and 1 outcome (quality) for wine dataset.

  1. fixed acidity \((g/l)\): non-volatile acid that do not evaporate readily (e.g. tartaric acid)

  2. volatile acidity \((g/l)\): high acetic acid in wine that leads to an unpleasant vinegar taste

  3. citric acid \((g/l)\): increases wine acidity and adds a fresh flavor to wine

  4. residual sugar \((g/l)\): the amount of sugar remaining after fermentation. Note that more than 45 g/l is a 'sweet wine'.

  5. chlorides \((g/l)\): the amount of salt in wine

  6. free sulfur dioxide \((mg/l)\): prevents microbial growth and the oxidation of wine

  7. total sulfur dioxide \((mg/l)\): the portion of free SO2 plus the portion bound to other chemicals in wine

  8. density \((g/cm^{3})\): sweeter wines have a higher density

  9. pH: the level of acidity on a scale of 0–14. Most wines range from 3 to 4 on the pH scale.

  10. sulphates \((g/l)\): a food preservative to maintain the flavor and freshness of wine

  11. alcohol \((vol.\%)\): Note that the alcohol content of wine is between 5% and 23% ABV.

  12. quality: a grading scale of wine quality that ranges from 0 (very bad) to 10 (excellent).

2.2 Number of records

  • red wine: 1599 examples

2.3 Types of variables

All the attributes are numeric and the outcome is integer.