Data locality: Map executing code on the node where the data resides. All clusters should have the appropriate topology. Hadoop map code must have the ability to read data locally. Hadoop must be aware of the topology of the nodes where tasks are executed. Tasktracker nodes are used to execute map tasks, and so the Hadoop scheduler needs information about node topology for proper task assignment. In other words, whenever you use a MapReduce program on a particular part of HDFS data, you always want to run that program on the node, or machine, that actually stores this data in HDFS. Doing so allows processes to be run much faster, since it prevents you from having to move large amounts of data around.
When a MapReduce job is executed, part of what the JobTracker does is look to see which machines the information required for the task is located on. Once it is located, the NameNode splits data files into blocks, each one replicated three times: The first is stored on the same machine as the block, while the second and third are each stored on separate machines. This is part of Hadoop's distributive process.
Storing the data across three machines thus gives you a much higher chance of achieving data locality, since it's likely that at least one of the machines will be freed up enough to process the data stored at that particular location.