Hash functions are an essential part of the cybersecurity ecosystem. They are used frequently, for example, as a key component of blockchain management for cryptocurrencies.
A hash is a mathematical function that maps input values of arbitrary length to output values of fixed length. In other words, it can create a short digest or digital fingerprint of any data, making the amount of data smaller and the format fixed. This summary or digital fingerprint is called hash values (hash sums, hash codes, or hashes). It usually looks like a random string of numbers or letters because you cannot (or it is difficult to) observe or reason out the original data from the hash value. This kind of function is a one-way function. Like trying to un-bake a cake, hash functions are irreversible, which is a vital feature for data protection.
If two different inputs to a hash function produce the same hash value, we call it a collision. A good hash function should avoid collisions as much as possible, and if a collision is not suppressed in a hash table, it will make the data difficult to search. If a collision occurs during communication, there is a risk that the data may have been tampered with.
Another critical aspect is Hash Tables. They are the infrastructure used to store the Hash function’s input and output values, which are more commonly recorded as Keys and Values. Keys are the input values, and the ‘Values’ are the output. When the data is stored, the index of the data is used to find the corresponding storage location through the Hash function. This correspondence is recorded in the hash table. The correct storage location is found and read through the same process during the lookup.
To explain further, imagine a phone book where the Keys are the names and the values are the phone numbers. You could accidentally identify the wrong person if different names correspond to the same phone number. Similarly, suppose another Key corresponds to the same value. This could potentially cause problems in a data search, which is the collision mentioned earlier. If a collision occurs during data storage, it can be remedied by chaining and open addressing, but the best way is to avoid the collision directly by ensuring that hash functions have the following characteristics.
A given input value must always produce the same output value. This ensures that the same results are obtained for different objects at different times.
In cryptography, trying to find the original message for the hash value generated by a given hash function is called a preimage attack. A good hash function should be able to resist a preimage attack, called preimage resistance, which is a vital part of guaranteeing privacy during communications.
Have you ever wondered how social networking sites store our account passwords? Is it easy for the community site engineers to read our passwords if it is stored in plain text? It works because when we register, the site enters our password into the hash function and calculates the hash value to store it in the database. In the future, when we log in, we will submit the password we entered the hash function again, and the hash value will be calculated and compared with the other one in the database. If it is the same, it means the user entered it correctly. If not, it is wrong. Meanwhile, the hash value in the database cannot leak the original password information (remember the preimage resistance?)