Variable-sized block encryption

Provide two functions that encrypt/decrypt a block of data. Functions have to match the following signature:

void encrypt(void *dst, const void *src, u64 len, const void *keys, const void *ofs);
void decrypt(void *dst, const void *src, u64 len, const void *keys, const void *ofs);
This is the same problem as Challenge 2, except that the length argument doesn't have to be a multiple of your blocksize. You are not allowed to use padding or counter mode. And you have to any number of bytes as the length.

Evaluation

For performance, both functions are called 4094 times with destination sizes of 1 to 4096. Total execution time for all 4096 calls determines the winner.

For security, all candidates are considered barely secure until somebody finds an attack faster than brute-forcing a 256bit key.

Rationale

I am not aware of anyone solving this problem before. But I think it is solvable and that a solution can be useful. Not using counter mode means that we don't depend on the quality of the nonce and our primitive is less fragile.

It makes sense to continue using a normal blocksize for everything except the tail. If your blocksize is 16 bytes, you may have up to 15 remaining bytes to deal with. For those you need a different solution.

You can create a block cypher with a block size of 1 byte. For example, you could repeatedly apply the AES s-box, then XOR with a key byte. Given enough rounds, this should be secure. It would also be rather slow.