Block encryption with offset/nonce

Provide two functions that encrypt/decrypt a block of data. Functions have to match the following signature:

void encrypt(void *dst, const void *src, u64 len, const void *keys, const void *ofs);
void decrypt(void *dst, const void *src, u64 len, const void *keys, const void *ofs);
dst, src and len parameters should be obvious. They specify the destination, source and length to be encrypted/decrypted. If you can, please allow destination and source to be the same address.

keys can either be the 256bit key or a larger set of round keys. If you use round keys, you are limited to 4kiB.

ofs is a 256bit parameter that can be a combined offset/nonce. First 8 bytes shall be interpreted as a little-endian byte offset. For example, when compressing 512 bytes with an offset of 1024 and using 16B blocksize, the offset for each block is 1024, 1040, 1056, 1072, etc.

Obvious use for the offset is as the disk address for disk encryption. It can also be the offset within a stream. The caller can use the remaining 192 bits (24 bytes) as a nonce, if desired.

Block size must be a power of two and no larger than 4096 bytes. 512 bytes or smaller would be convenient to support old 512B disk sectors, but most of the world seems to have lost interest in those. Feel free to go all the way to 4096B if that provides a meaningful advantage.

Evaluation

For performance, the time to encrypt and decrypt a 1MiB block will be used to determine the fastest candidate. Candidates using expanded round keys have to pay for key expansion as well.

For security, all candidates are considered barely secure until somebody finds an attack faster than brute-forcing a 256bit key.

Rationale

In my opinion, all classical Block cipher modes of operation are problematic. The one decent mode is XTS. I still don't consider it good. The problem is that we start with simple block encryption without using an offset and XTS tries to build something decent on top of that foundation. The right approach would be to push the offset into block encryption. In other words, the rules of the AES competition were wrong.

In a way, our problem is one of statistics. With ECB, the same input block always generates the same output block and an attacker can easily create a large dictionary of input/output pairs. Add the offset parameter and two identical inputs at different offsets generate different outputs. The attacker now needs a dedicated dictionary for every offset. The problem is several orders of magnitude harder. Add the nonce parameter and the entire approach is hopeless.

If callers decide to ignore the nonce parameter or reuse the same value, we still get the security guarantees for XTS, which are quite decent.

For practical reasons, turning random/accidental bit flips into entire blocks getting corrupted is useful as well. A lot of software has reached reliability levels where hardware failures become a dominant cause of crashes. If a single bit is flipped, it is hard to attribute the cause to hardware. Various software bugs could have the same effect. But with encrypted memory, a bit flip in hardware becomes a corrupted block after decryption and is much easier to attribute.

Round keys

If you decide to use expanded round keys, you do not have to provide your own method for key expansion. Challenge 3 is to provide a key expansion method and you can simply use the fastest candidate from there. Or if you provide your own, consider adding it as a candidate to challenge 3.