# What is a compression function?

What is a compression function? 해시게임 Damaged and Merkle defined a hash function called the so-called compression function.

Which greatly influenced the design of cryptographic hash functions.

## The compression function takes a fixed-length input and returns a shorter, fixed-length output.

A hash function can then be defined by repeated application of the compression function until the entire message has been processed.

In this process, a message of arbitrary length is divided into chunks of a certain length according to the compression function

### and “padded” (for security reasons) so that the size of the message is a multiple of the chunk size.

These blocks are then processed sequentially, taking the hash result so far and the current block of information as input,

and the final output is the hash value of the information.

## What are pseudo-collisions?

A collision is a collision of the compression function at the heart of the iterative hash function.

The collision of a hash function’s compression function can be useful when constructing a collision of the hash function itself,

### but this is usually not the case.

While collisions may be seen as an unfortunate property of hash functions,

a pseudo-collision is not the same as a collision,

and hash functions can still be safe.

MD5 is an example of a hash function where false collisions have been found but are still considered safe.

## What are MD2, MD4, and MD5?

MD2, MD4, and MD5 are message digest algorithms developed by Rivest.

They are for digital signature applications,

where a large message must be ‘compressed’ in a secure manner before it can be signed with a private key.

All three algorithms accept messages of arbitrary length and generate 128-bit message digests.

### Although the structure of these algorithms is somewhat similar,

the design of MD2 is completely different from that of MD4 and MD5,

because MD2 is optimized for 8-bit machines,

while MD4 and MD5 are for 32-bit computers.

Descriptions and source code for the three algorithms can be found in Internet RFCs 1319 – 1321.

### MD2 was developed by Rivest in 1989.

The information is first padded (data bits) so that its byte length is a multiple of 16.

A 16-byte checksum is then appended to the end of the message,

and a hash value is calculated on this newly generated message.

Rogier and Chauvaud found that MD2 collisions can be constructed if the checksum computation is omitted.

This is the only cryptanalysis result known to MD2.

### MD4 was developed by Rivest in 1990.

Pad the information to ensure that the length of the information after adding 448 is divisible by 512.

The 64-bit binary representation of the original length of the message is then added to the message.

The message is processed in 512-bit blocks in a Damgård/Merkle iterative structure,

### each block is processed in three distinct steps.

The likes of Den Boer and Bosselaers very quickly developed an attack on the MD4 version in which the first or last round was lost.

Dobbertin has shown on a typical PC how to find the full version of MD4 collisions in under a minute.

### Apparently, MD4 is now considered broken.

MD5 was developed by Rivest in 1991.

It adds the concept of “safety belts” to md4, which is slightly slower than MD4, but safer.

The algorithm consists of 4 distinct steps and is designed slightly differently than MD4.

The message digest size and padding requirements remain the same.

### Den Boer and Bosselaers found spurious collisions for MD5,

but no other known cryptanalysis results.

Van Oorschot and Wiener considered a crude brute force search for collisions in the hash function,

and they estimated that a collision search machine designed specifically for MD5 (costing \$10 million in 1994) could find an MD5 collision every 24 days on average.

The general technique can be applied to other hash functions.

## What are Secure Hash Algorithms (SHA and SHA-1)?

Algorithms specified in the Secure Hash Standard (SHS) – Secure Hash Algorithm (SHA) was developed by NIST and published as a Federal Information Processing Standard (FIPS PUB 180).

SHA-1 is a revision of the SHA published in 1994.

The revision corrects an unpublished flaw in SHA.

### Its design is very similar to the family of MD4 hash functions developed by Rivest.

The algorithm takes lengths less than 2 64 2^{64}2
64 bits of the message,

### resulting in a 160-bit message digest.

The algorithm is slightly slower than MD5,

but the larger message digest makes it more secure against brute-force collisions and inversion attacks.

## Emerging Privacy-Preserving Technologies

Many international organizations regard the protection of privacy as a basic requirement and stipulate some principles,

Such as collection restrictions, data quality, usage restrictions, usage restrictions, security assurances, openness, personal participation, and accountability.

### These principles help manage privacy requirements over the life of a system.

As system complexity increases, storage units and computing units may not be centralized, reducing the risk of privacy disclosure is challenging.

Such systems, such as those based on IoT sensors, wearable computing devices, mobile computing, and smart meters, require stronger privacy technologies and protocols.

Such privacy techniques should consider the deployment architecture, availability of individual nodes in the computing system, sensitive data flows, and threat models.

## Privacy Protection in Distributed Environments

Taking the hospital as an example again, in order to establish a global model for disease prediction while protecting privacy,

The local model needs to be trained on local data on each user’s mobile device.

The learned model parameters are sent by each user device to the cloud server,

Perform aggregations on cloud servers to build global models.

### This learned global model is pushed to each user’s mobile device for prediction.

This is the simple federated learning architecture.

Data analytics for IoT further extends these distributed architectures.

For example, in edge computing, heavy computing tasks are transferred to edge nodes,

While client devices such as IoT sensors are assigned a lightweight task,

### Its output is used to perform heavyweight tasks at edge nodes.

A local differential privacy obfuscation framework may ensure data privacy and data utility for edge computing.

The basic approach of the local differential privacy obfuscation framework is still to add noise to prevent private information from leaking.

However, adding noise may reduce the utility of the data, while feature distillation models may limit the collection of personal data,

### While still maximizing the utility of the data.

The fundamental components of this framework involve learning features of data using data minimization,

And perturb these identified features using local differential privacy techniques to preserve privacy.

Additionally, these features are anonymized into bit strings using a different hash function so that the transformation generates unique strings.

Finally, the data is transmitted to the edge server,

where the hash function is used for feature reconstruction and distribution estimation,

thus preventing sensitive attributes from being exposed.

## Encryption technology for data privacy

In addition to the distributed architecture of federated learning type,

Fully homomorphic encryption and multi-party secure computing are also encryption technologies that can be used for privacy data protection.

Fully homomorphic encryption is an encryption scheme that enables analytical functions to operate directly on encrypted data,

Also produces the same encrypted result as the function performed on the plaintext.

### While this is exciting from a security and privacy perspective,

However, at the current state of the art, the computation speed of fully homomorphic encryption is several orders of magnitude slower than the equivalent plaintext computation.

Even so, it’s already a big improvement.

Given the potential benefits of FHE for cloud computing,

### the standardization of fully homomorphic encryption is underway.

Multi-party secure computing allows multiple parties to perform computations on their private data to evaluate features of common interest, highly suitable for machine learning,

Because it allows companies to provide their models to make inferences about customers’ private data while ensuring maximum privacy.

### Much of the multi-party secure computation involves a large amount of message passing overhead,

and research to develop cheap, efficient, and effective multi-party secure computation techniques is ongoing.

There are also attempts to combine the two techniques,

trying to come up with a hybrid solution with acceptable time and communication complexity.