Ista 440 -

Unlike homework datasets where the data is clean, often requires web scraping, API calls (Twitter, Reddit, Yelp), or joining disparate SQL databases. Students spend approximately 40% of their total time on this phase, using pandas to handle missing values, outliers, and inconsistent schemas.