Chapter 2 Data sources
Eric is responsible for collecting the data.
Fortunately, the dataset, related to Portuguese red Vinho Verde wine samples, is publicly available on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/wine+quality. Notice that, due to privacy and logistic issues, we have no access to wine brands, selling price, etc. For more information, see the reference [Cortez et al., 2009] http://www3.dsi.uminho.pt/pcortez/wine/.
2.1 Attributes and outcome
In total, we have 11 attributes and 1 outcome (quality) for wine dataset.
fixed acidity \((g/l)\): non-volatile acid that do not evaporate readily (e.g. tartaric acid)
volatile acidity \((g/l)\): high acetic acid in wine that leads to an unpleasant vinegar taste
citric acid \((g/l)\): increases wine acidity and adds a fresh flavor to wine
residual sugar \((g/l)\): the amount of sugar remaining after fermentation. Note that more than 45 g/l is a 'sweet wine'.
chlorides \((g/l)\): the amount of salt in wine
free sulfur dioxide \((mg/l)\): prevents microbial growth and the oxidation of wine
total sulfur dioxide \((mg/l)\): the portion of free SO2 plus the portion bound to other chemicals in wine
density \((g/cm^{3})\): sweeter wines have a higher density
pH: the level of acidity on a scale of 0–14. Most wines range from 3 to 4 on the pH scale.
sulphates \((g/l)\): a food preservative to maintain the flavor and freshness of wine
alcohol \((vol.\%)\): Note that the alcohol content of wine is between 5% and 23% ABV.
quality: a grading scale of wine quality that ranges from 0 (very bad) to 10 (excellent).
2.2 Number of records
- red wine: 1599 examples
2.3 Types of variables
All the attributes are numeric and the outcome is integer.