Now, we are going to write a complicated piece of software. A hamming code generator creates a piece of data that should be able to be recovered when it is transmitted between two mediums. These mediums could be anything from a computer to another computer or even a process to another process (although we should hope that data transmission between processes does not get corrupted). The data that we will be adding to accomplish this is known as hamming codes.
To write this piece of software, we will need to understand how a hamming code is generated and how we can use a verifier to make sure that the data that does cross from one medium to another is correct. Specifically, we will be looking at the creation of hamming data and the verification process. We won't be looking at recovering the data as this is close to the reverse process of creating the data.
To understand how hamming data is created, we will need to look at data at the bit level. This means that if we want to transmit the number 100, we need to know what that looks like in terms of bits. Bits are the lowest data unit for a computer. A bit can only be a 0 or a 1. As we add more bits together, they represent the power of 2. The following table should help showcase this a bit:
Bit 3 | Bit 2 | Bit 1 | Bit 0 |
8 | 4 | 2 | 1 |
2^3 | 2^2 | 2^1 | 2^0 |
As we can see, each bit location represents the next power of two. If we mix and match these bits together, we will find that we can represent all positive real numbers. There are also ways to represent negative and even floating-point numbers, but we will not be going into that here.
So, if we wanted to see these numbers in their binary form, we could go through them one at a time. The following table shows the decimal notation on the left and the binary representation on the right (decimal is what we are used to):
0 | 0000 |
1 | 0001 |
2 | 0010 |
3 | 0011 |
4 | 0100 |
5 | 0101 |
6 | 0110 |
7 | 0111 |
8 | 1000 |
Hopefully, this clarifies how bits and binary representation works. Now, we are going to move on to how hamming codes actually work. Hamming codes work by adding what is known as parity bits to special locations in the data transmission process. These parity bits will either be a 1 or a 0, depending on the type of parity that we select.
The two types of parity that we can choose are even parity and odd parity. Even parity means that when we add up all of the bit locations for that parity bit, they need to be an even number. If we choose odd parity, we need to add up all the bits for that parity location and check to make sure they are odd. Now, we need to decide what bits correspond to each parity bit location and even where the parity bits go.
First, we will take a look at where the parity bits go. Parity bits will be at the bit location for each power of 2. Just as we saw in the preceding table, we will host our parity bits at the following bit locations: 1, 2, 4, 8, and 16. If we look at the preceding table, we will notice that these correspond to the bits where there is only a single bit set.
Now, we need to decide which data bit locations correspond to our parity bit locations. Well, we might be able to guess these based on where the parity bits are located. For each data bit, we will look at if they have the corresponding parity bit set there. This can be seen in the following table:
Number (Decimal Format) | Is It A Parity Bit? | Parity Bits That Use This Data |
1 | Yes | N/A |
2 | Yes | N/A |
3 | No | 1, 2 |
4 | Yes | N/A |
5 | No | 1, 4 |
6 | No | 2, 4 |
7 | No | 1, 2, 4 |
8 | Yes | N/A |
The final piece that we need to know about is how to mesh our data with the parity data. The best way to look at this is through an example. Let's take the number 100 and turn it into its binary representation. We could do this by hand or we could pull up a programmer's calculator, which most operating systems have.
If we open up our calculator and type in 100, we should get the following binary representation of it: 1100100. Now, to add our parity bits, we will need to shift our data bits based on whether we place a parity bit there or not. Let's take this a step at a time:
- Is the first bit used as a parity bit? Yes, so we will place a 0 there and shift our data to the left once. We now have 11001000.
- Is the second bit used as a parity bit? Yes, so we will place a 0 there and shift our data to the left once. We now have 110010000.
- Is the third bit used as a parity bit? No, so we can put our original first data bit there, which is a zero. Our data looks the same as before: 110010000.
- Is the fourth bit used as a parity bit? Yes, so we will place a 0 there and shift our data to the left once. We now have 1100100000.
- Is the fifth bit used as a parity bit? No, so we will place our original second data bit there, which is a zero. Our data looks the same as before: 1100100000.
- Is the sixth bit used as a parity bit? No, so we will place our original third data bit there, which is a one. Our data looks the same as before: 1100100000.
- Is the seventh bit used as a parity bit? No, so we will place our original fourth data bit there, which is a zero. Our data looks as follows: 1100100000.
- Is the eight-bit used as a parity bit? Yes, so we will shift our data to the left one and place a zero there. Our data looks as follows: 11000100000.
For the rest of the numbers, they stay the same since we have no more parity bits to place. Now that we have our data, we have to set our parity bits. We will use even parity for our example and the code. The following table showcases the final number and the reason why we had to set a parity bit to one or zero:
Bit Location | Binary For Location | Do We Set It? | Count For Parity |
1 | 00001 | Yes | 1 |
1 | 00010 | Yes | 3 |
0 | 00011 | N/A | |
1 | 00100 | Yes | 1 |
0 | 00101 | N/A | |
1 | 00110 | N/A | |
0 | 00111 | N/A | |
0 | 01000 | No | 2 |
0 | 01001 | N/A | |
1 | 01010 | N/A | |
1 | 01011 | N/A |
As shown in the preceding table, we had to set the parity bits for the 1, 2, and 4 locations. Let's take a look at the second bit and go through the process. We will look for any bit locations where their binary representation has the second-bit set. If the bit is set at that location, we will count it. After adding up all of these numbers, if they add up to an odd number, we will need to set the parity bit location. For the second bit, we can see that the number 6, 10, and 11 locations have their second-bit set and that they have a 1 at them. This is why we have a count of three, which means we need to set our parity bit to make sure we have even parity.
This is a lot of information to take in, and rereading the preceding sections may help you understand how we got to this final parity number. If you want to find out more, go to https://www.geeksforgeeks.org/hamming-code-in-computer-network/.
Now, with all of this theory out of the way, let's go ahead and start writing our C program to be able to create parity data and also verify it.
First, let's create a file called hamming.c. We will create this as a pure library file, so we won't have the main function. Now, let's go ahead and stub out our functions just to get an idea of what we want to do. Follow these steps:
- To create our data, we will need to read in the data and move the data bits to the proper locations, the same way that we did previously. Let's go ahead and call this function placeBits:
void placeBits(int data, int* parity) {
}
// creation of true data point with parity bits attached
int createData(int data) {
int num = 0;
placeBits(data, &num);
return num;
}
We can see something interesting about the method signature of the placeBits function. It is taking in an int*. For JavaScript developers, this will be a new concept. We are passing in the location of the data instead of passing in the data itself. This is known as passing by reference. Now, the idea is similar to what it's like in JavaScript; that is, if we pass an object, we are passing the reference to it. This means that when we make changes to that data, we will see these changes in the original function. It is the same concept as the previous one, but we have a bit more control over this. If we don't pass by reference, it would pass by value, meaning we would get a copy of the preceding data and we wouldn't see the changes reflected in our createData function.
- Now, we need to have a function that figures out if we set the parity bit for that location. We will call this createParity. It should have a method signature that looks like this:
void createParity(int* data)
Again, we are passing in a reference to the data instead of passing the data itself.
- For our data checking algorithm, we will be going through each parity bit and checking the respective data locations for each. We will call this function checkAndVerifyData and it will have the following method signature:
int checkAndVerifyData(int data)
Now, instead of a Boolean, we will be passing back an int, where -1 means that the data is bad and 1 means that the data is good. In basic C, we don't have the concept of a Boolean, so we use numbers to represent the concepts of true or false (there is a Boolean in the stdbool header, but if we look at it, it utilizes the concept of 0 being false and 1 being true, so it still utilizes the numbers underneath). We can also make the system more robust by making each negative number mean a specific error code. In our case, we will just use -1, but this can be improved.
- Now, we can begin filling out our functions. First, we will place our data in the correct locations and make sure we have room for our parity bits. This will look as follows:
const int INT_SIZE = sizeof(int) * 8;
void placeBits(int data, int* parity) {
int currentDataLoc = 1;
int dataIterator = 0;
for(int i = 1, j = 0; i < INT_SIZE; i++, j++) {
if(ceil(log2(i)) == floor(log2(i))) continue; //we are at a
parity bit section
*parity |= ((data & (currentDataLoc << dataIterator)) << (j
- dataIterator));
dataIterator++;
}
}
First, we created a constant known as INT_SIZE. This allows us to handle different types of environments (although WebAssembly is supposed to be a standardized environment to work in, this allows us to use this C program elsewhere). We are also utilizing three special functions: ceil, floor, and log2. All of these can be found in the math library that comes with the standard library for C.
We get this by importing the header file at the top of our file:
#include <math.h>
The iteration process works like so:
- It checks to see if we are at a parity bit section. If we are, we will skip it and move on to the next section.
- If we are not at a parity bit section, we will take the bit in our data at dataIterator. This counter keeps count of our location in the data that we passed in. All of the preceding operations are bit operations. The | tells us we are doing a bitwise or, which means that the bit will be set to a 1 if the left-hand side (the parity variable), the right-hand side (our equation), or both are 1; otherwise, it will be a 0.
- We do a bitwise AND on our data with the bit set at our dataIterator. This will let us know if we have a bit set there. Finally, we need to make sure that we shift that bit by the number of parity bits that are already set (this is j – dataIterator).
- If we reach the bottom of this for loop, then we check a data bit, so we need to increment our dataIterator.
Now, we can fill in our createParity method with the following code:
void createParity(int* data) {
int parityChecks[4] = {1, 2, 4, 8};
int bitSet[4] = {1, 2, 8, 128};
for(int i = 0; i < 4; i++) {
int count = 0;
for(int j = 0; j < INT_SIZE; j++) {
if((parityChecks[i] & (j+1)) != 0) {
count += ((*data & (1 << j)) != 0) ? 1 : 0;
}
}
if( count % 2 != 0 ) {
*data |= bitSet[i];
}
}
}
This section can be a bit more complicated, but it is doing what we did by hand previously:
- First, we are only going to handle a certain amount of bits for our data, so we will only use four parity bits. These parity bits correspond to the 0, 1, 2, and 4-bit locations, which are the decimal numbers 1, 2, 4, and 8.
- Next, these bits are located at the 1, 2, 4, and 8-bit locations, which are represented as 1, 2, 8, and 128 in decimal form. This just makes it easier if we need to set the parity bit located there.
- Now, we will loop through each of our parity checks and see if our newly moved data has a bit set there:
if((parityChecks[i] & (j+1)) != 0) {
count += ((*data & (1 << j)) != 0) ? 1 : 0;
}
We are checking to make sure that the bit we are currently looking at is the data bit that we are worried about for that parity bit. If it is, we will add to the counter if the data bit is set there. We will do this by utilizing a bitwise AND with the data. If we don't get a zero, this means the bit is set, so we will add to the counter.
- At the end of this for loop, if we don't have even parity, we will need to set the parity bit at this location to get even parity.
Now, let's compile our program with the following command-line operation:
> emcc -s "EXPORTED_FUNCTIONS=['_createData']" hamming.c
Next, we need to go to our index.html page in the browser and run the following command in our developer's tools:
Module._createData(100);
By doing this, we should get the following output: 1579. If we put this decimal number into our programmer's calculator, we will get the following binary representation: 11000101011. If we head back up and check when we did this by hand, we will see that we got the exact same thing! This means our hamming data generator is working. Now, let's make the verification tool. Follow these steps to do so:
- Inside our checkAndVerifyData method, we will add the following code:
int checkAndVerifyData(int data) {
int verify = 0;
int parityChecks[4] = {1, 2, 8, 128};
for(int i = 0; i < 4; i++) {
verify = checkRow(&data, parityChecks[i]);
if(verify != 0) { // we do not have even parity
return -1;
}
}
return 1;
}
Here, we have a verify variable that will tell us if the data is good or not. If it isn't good, it will output our error status, which is a -1. Otherwise, we'll have run through the data and seen it was good, so we'll return a 1. Next, we'll utilize the parity bits which, as we already know, are held at the decimal numbers 1, 2, 8, and 128. We will loop through these and check our hamming data with it by utilizing the checkRow method.
- The checkRow method will utilize similar concepts to our creation process. This looks as follows:
int checkRow(int* data, int loc) {
int count = 0;
int verifier = 1;
for(int i = 1; i < INT_SIZE; i++) {
if((loc & i) != 0 ){
count += (*data & (verifier << (i - 1))) != 0 ? 1 : 0;
}
}
return count % 2;
}
Again, this should be very similar to our createParity method. We will run through the number and check to see if this is a parity bit number. If it is, we will utilize a bitwise AND operation at the location with a number we know has the bit set. If it is not equal to 0, then the bit is set and we update our counter. We will return our counter modded with 2 since this will tell us if we have even parity.
This should always return an even number (in our case, 0). If it doesn't, we instantly error out. Let's compile this with the following command:
> emcc -s "EXPORTED_FUNCTIONS=['_createData', '_checkAndVerifyData']" hamming.c
Now, we can head into our browser and use the number we got from the createData method. Head into the developer console and run the following command:
Module._checkAndVerifyData(1579);
It should print out a 1, which means that we have good hamming data! Now, let's try it with an example that we haven't worked out by hand: the number 1000. Run the following commands; we should get the same results:
Module._createData(1000); // produces 16065
Module._checkAndVerifyData(16065); //produces 1
Now, we have a working hamming data creation method and a verification tool all written in C and running in the browser! This should help you understand how to port existing applications to the browser, but also how to utilize this powerful technology that allows you to run near-native speeds for computationally intensive applications.
The final section of this chapter will take a look at one of the ports that's being utilized today, and even take a look at some of the code that goes into it. This library is utilized by a lot of application developers and is known as SQLite.