1. Dataset Overview
2,527
Total images (raw)
6
Original categories
2,390
Images after removing trash
5
Categories used for modelling
512 × 384
Uniform source resolution (px)
2. Class Distribution — Full Dataset
3. Class Distribution — Modelling Dataset (trash excluded)
4. Per-Category Descriptive Statistics
| category | count | mean_W | mean_H | mean_size_kb | mean_aspect |
|---|---|---|---|---|---|
| cardboard | 403 | 512.0 | 384.0 | 41.3 | 1.333 |
| glass | 501 | 512.0 | 384.0 | 38.7 | 1.333 |
| metal | 410 | 512.0 | 384.0 | 35.2 | 1.333 |
| paper | 594 | 512.0 | 384.0 | 44.1 | 1.333 |
| plastic | 482 | 512.0 | 384.0 | 37.8 | 1.333 |
| trash | 137 | 512.0 | 384.0 | 36.5 | 1.333 |
5. File Size Distribution per Category
6. Per-Channel Pixel Statistics
| category | Red | Green | Blue |
|---|---|---|---|
| cardboard | 0.698 | 0.631 | 0.536 |
| glass | 0.424 | 0.441 | 0.464 |
| metal | 0.498 | 0.491 | 0.499 |
| paper | 0.847 | 0.835 | 0.822 |
| plastic | 0.541 | 0.519 | 0.511 |
7. Key Observations
- Class imbalance: The trash category is severely underrepresented (137 images vs. a mean of ~478 for the remaining five classes). It is excluded from modelling to prevent label bias.
- Uniform image dimensions: All images share a resolution of 512 × 384 px, simplifying the preprocessing pipeline.
- EDA-level ambiguity cues: The sampled colour and texture profile still suggests that cardboard and paper can be visually similar, so this pair remains a reasonable qualitative hypothesis from EDA alone.
- Main-notebook quantitative validation: Full test-split evaluation in the main ViT-Large notebook shows that the dominant residual confusion is actually plastic ↔ glass, while paper ↔ cardboard appears but at a lower rate.
- Glass and plastic difficulty: The main notebook's confusion matrix reports plastic as the weakest class overall, which is consistent with the challenge of separating transparent and reflective materials under varied lighting and backgrounds.