Data cleaning

Data cleaning, also known as data cleansing or data scrubbing, is a process consisting of the following steps:

  1. Identifying inaccurate, incomplete, irrelevant, or corrupted data to remove it from further processing
  2. Parsing data, extracting information of interest, or validating whether a string of data is in an acceptable format
  3. Transforming data into a common encoding format, for example, UTF-8 or int32, time scale, or a normalized range
  4. Transforming data into a common data schema; for instance, if we collect temperature measurements from different types of sensors, we might want them to have the same structure