Data cleaning, also known as data cleansing or data scrubbing, is a process consisting of the following steps:
- Identifying inaccurate, incomplete, irrelevant, or corrupted data to remove it from further processing
- Parsing data, extracting information of interest, or validating whether a string of data is in an acceptable format
- Transforming data into a common encoding format, for example, UTF-8 or int32, time scale, or a normalized range
- Transforming data into a common data schema; for instance, if we collect temperature measurements from different types of sensors, we might want them to have the same structure