The next action is to process the header bodies into subitems. To do this, add the following highlighted declaration to the public section of the header_body class:
public:
header_body() = default;
header_body(const string& b) : body(b) {}
string get_body() const { return body; }
vector<pair<string, string>> subitems();
};
Each subitem will be a name/value pair, and since the order of a subitem may be important, the subitems are stored in a vector. Change the main function, remove the call to get_headers, and instead print out each header individually:
email eml;
eml.parse(stm);
for (auto header : eml) {
cout << header.first << " : ";
vector<pair<string, string>> subItems = header.second.subitems();
if (subItems.size() == 0) {
cout << header.second.get_body() << "n";
} else {
cout << "n";
for (auto sub : subItems) {
cout << " " << sub.first;
if (!sub.second.empty())
cout << " = " << sub.second;
cout << "n";
}
}
}
cout << "n";
cout << eml.get_body() << endl;
Since the email class implements the begin and end methods, it means that the ranged for loop will call these methods to get access to the iterators on the email::headers data member. Each iterator will give access to a pair<string,header_body> object, so in this code we first print out the header name and then access the subitems on the header_body object. If there are no subitems, there will still be some text for the header, but it won't be split into subitems, so we call the get_body method to get the string to print. If there are subitems then these are printed out. Some items will have a body and some will not. If the item has a body then the subitem is printed in the form name = value.
The final action is to parse the header bodies to split them into subitems. Below the header_body class, add the definition of the method to this:
vector<pair<string, string>> header_body::subitems()
{
vector<pair<string, string>> subitems;
if (body.find(';') == body.npos) return subitems;
return subitems;
}
Since subitems are separated using semicolons there is a simple test to look for a semicolon on the body string. If there is no semicolon, then an empty vector is returned.
Now the code must repeatedly parse through the string, extracting subitems. There are several cases that need to be addressed. Most subitems will be in the form name=value;, so this subitem must be extracted and split at the equals character and the semicolon discarded.
Some subitems do not have a value and are in the form name; in which case, the semicolon is discarded and an item is stored with an empty string for the subitem value. Finally, the last item in a header may not be terminated with a semicolon, so this must be taken into account.
Add the following while loop:
vector<pair<string, string>> subitems;
if (body.find(';') == body.npos) return subitems;
size_t start = 0;
size_t end = start;
while (end != body.npos){}
As the name suggests, the start variable is the start index of a subitem and end is the end index of a subitem. The first action is to ignore any whitespace, so within the while loop add:
while (start != body.length() && isspace(body[start]))
{
start++;
}
if (start == body.length()) break;
This simply increments the start index while it refers to a whitespace character and as long as it has not reached the end of the string. If the end of the string is reached, it means there are no more characters and so the loop is finished.
Next, add the following to search for the = and ; characters and handle one of the search situations:
string name = "";
string value = "";
size_t eq = body.find('=', start);
end = body.find(';', start);
if (eq == body.npos)
{
if (end == body.npos) name = body.substr(start);
else name = body.substr(start, end - start);
}
else
{
}
subitems.push_back(make_pair(name, value));
start = end + 1;
The find method will return the npos value if the searched item cannot be found. The first call looks for the = character and the second call looks for a semicolon. If no = can be found then the item has no value, just a name. If the semicolon cannot be found, then it means that the name is the entire string from the start index until the end of the string. If there is a semicolon, then the name is from the start index until the index indicated by end (and hence the number of characters to copy is end-start). If an = character is found then the string needs to be split at this point, and that code will be shown in a moment. Once the name and value variables have been given values, these are inserted into the subitems data member and the start index is moved to the character after the end index. If the end index is npos then the value of the start index will be invalid, but this does not matter because the while loop will test the value of the end index and will break the loop if the index is npos.
Finally, you need to add the code for when there is an = character in the subitem. Add the following highlighted text:
if (eq == body.npos)
{
if (end == body.npos) name = body.substr(start);
else name = body.substr(start, end - start);
}
else
{
if (end == body.npos)
{
name = body.substr(start, eq - start);
value = body.substr(eq + 1);
} else {
if (eq < end) {
name = body.substr(start, eq - start);
value = body.substr(eq + 1, end - eq - 1);
} else {
name = body.substr(start, end - start);
}
}
}
The first line tests to see if the search for a semicolon failed. In this case, the name is from the start index until the character before the equals character, and the value is the text following the equals sign until the end of the string.
If there are valid indices for the equals and semicolon characters then there is one more situation to check for. It is possible that the location of the equals character could be after the semicolon, in which case it means that this subitem does not have a value, and the equals character will be for a subsequent subitem.
At this point you can compile the code and test it with a file containing an email. The output from the program should be the email split into headers and a body, and each header split into subitems, which may be a simple string or a name=value pair.