Categorical Data Statistics Example 4

Follow the full solution, then compare it with the other examples linked below.

Example 4

hard
A researcher records the following for 100 cars: colour (red, blue, white, black, other), fuel type (petrol, diesel, electric, hybrid), and fuel efficiency (km/L). (a) Identify all categorical variables. (b) Can the researcher find a correlation between colour and fuel type? Explain.

Solution

  1. 1
    Step 1: Categorical variables: colour and fuel type. Numerical variable: fuel efficiency.
  2. 2
    Step 2: Correlation (like Pearson's rr) measures linear association between two numerical variables. Since both colour and fuel type are categorical, a standard correlation coefficient cannot be computed. Instead, a two-way table (contingency table) and a chi-squared test could be used to check for association.

Answer

(a) Colour and fuel type are categorical. (b) No, standard correlation cannot be computed between two categorical variables; a two-way table or chi-squared test should be used instead.
Standard correlation measures require numerical data. For two categorical variables, association is assessed using contingency tables and tests like chi-squared. Choosing the wrong analysis method for the data type leads to meaningless results.

About Categorical Data

Categorical data is data that can be sorted into groups or categories, like colors, types, or names, rather than measured with numbers. You can count how many items fall into each category, but you cannot meaningfully add, subtract, or average the category labels themselves.

Learn more about Categorical Data โ†’

More Categorical Data Examples