One of the most well-known repositories of machine learning datasets is hosted by the University of California Irvine. The UCI repository contains over 300 datasets covering a wide variety of challenges, including poker, movies, wine quality, activity recognition, stocks, taxi service trajectories, advertisements, and many others. Each dataset is usually equipped with a research paper where the dataset was used, which can give you a hint on how to start and what the prediction baseline is.
The UCI machine-learning repository can be accessed at https://archive.ics.uci.edu, as follows:
Another well-maintained collection by Xiaming Chen is hosted on GitHub: https://github.com/caesar0301/awesome-public-datasets.
The awesome public dataset repository maintains links to more than 400 data sources from a variety of domains, ranging from agriculture, biology, economics, psychology, museums, and transportation. Datasets, specifically targeting machine learning, are collected under the image processing, machine learning, and data challenges sections.