Simplifying the write_csv() function

The write_csv() function uses a new method from the peewee library, allowing us to retrieve data from the database as dictionaries. Using the familiar Files.select().where() statement, we append the dicts() method to convert the result into Python dictionaries. This dictionary format is an excellent input for our reports, as the built-in CSV module has a class named DictWriter. As its name suggests, this class allows us to pass a dictionary of information to be written as a row of data in a CSV file. Now that we have our query staged, we can log to the user that we are starting to write the CSV report:

253 def write_csv(source, custodian_model):
254     """
255     The write_csv function generates a CSV report from the Files
256         table
257     :param source: The output filepath
258     :param custodian_model: Peewee model instance for the
259         custodian
260     :return: None
261     """
262     query = Files.select().where(
263         Files.custodian == custodian_model.id).dicts()
264     logger.info('Writing CSV report')

Next, we define our column names for our CSV writer and open the user-specified output file using the with...as... statement. To initialize the csv.DictWriter class, we pass the open file object and column headers that correspond to the table's column names (and therefore the dictionary key names). After initialization, we call the writeheader() method and write the table's header at the top of the spreadsheet. Finally, to write the row content, we open a for loop on our query object to iterate over the rows and write them to the file with the .writerow() method. Using the enumerate method, we can provide the user with a status update every 10,000 rows to let them know that our code is hard at work for larger file reports. After writing those status updates (and rows, of course), we add some additional log messages for the user and exit the function. Although we are calling the csv library, remember that it is actually our unicodecsv import. This means that we will encounter less encoding errors while generating our output versus using the standard csv library:

266     cols = [u'id', u'custodian', u'file_name', u'file_path',
267         u'extension', u'file_size', u'ctime', u'mtime',
268         u'atime', u'mode', u'inode']
269 
270     with open(source, 'wb') as csv_file:
271         csv_writer = csv.DictWriter(csv_file, cols)
272         csv_writer.writeheader()
273         for counter, row in enumerate(query):
274             csv_writer.writerow(row)
275             if counter % 10000 == 0:
276                 logger.debug('{:,} lines written'.format(counter))
277         logger.debug('{:,} lines written'.format(counter))
278 
279     logger.info('CSV Report completed: ' + source)