Cryptography Guidelines
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License because it took bloody ages to write.
Background
This document outlines recommendations for cryptographic algorithm choices and parameters as well as important implementation details based on what I have learnt from reading about the subject and the consensus I have observed online. Note that some knowledge of cryptography is required to understand the terminology used in these guidelines.
My goal with these guidelines is to provide a resource that I wish I had access to when I first started writing programs related to cryptography. If this information helps prevent even just one vulnerability, then I consider it time well spent.
Note
This document is slowly being rewritten and split into individual pages. Please view the sections folder for the latest information.
Acknowledgements
These guidelines were inspired by this Cryptographic Best Practices gist, Latacora’s Cryptographic Right Answers, and Crypto Gotchas, which is licensed under the Creative Commons Attribution 4.0 International License. The difference is that I mention newer algorithms and have tried to justify my algorithm recommendations whilst also offering important notes about using them correctly.
Contribute
If you find these guidelines helpful, please star this repository and share the link around. Doing so might just prevent someone from making a catastrophic mistake.
If you have any feedback, please contact me privately here or publicly here to help improve these guidelines. Pull requests are also welcome but please be prepared for things to be reworded.
Disclaimer
I’m a psychology undergraduate with an interest in applied cryptography, not an experienced cryptographer. I primarily have experience with the libsodium library since that’s what I’ve used for my projects, but I’ve also reported some security vulnerabilities related to cryptography.
Most experienced cryptographers don’t have the time to write things like this, and the following information is freely available online or in books, so whilst more experience would be beneficial, I’m trying my best to provide accurate information that can be fact checked. If I’ve made a mistake, please contact me to get it fixed.
Note that the rankings are based on my opinion, algorithm availability in cryptographic libraries, and which algorithms are typically used in modern protocols, such as TLS 1.3, Noise Protocol Framework, WireGuard, and so on. Such protocols and recommended practices make for the best guidelines because they’ve been approved by experienced professionals.
General Guidance
-
Research, research, research: you often don’t need to know how cryptographic algorithms work under the hood to implement them correctly, just like how you don’t need to know how a car works to drive. However, you need to know enough about what you’re trying to do, which requires looking up relevant information online or in books, reading the documentation for the cryptographic library you’re using, reading RFC standards, reading helpful blog posts, and reading guidelines like this one. Furthermore, reading books about the subject in general will be beneficial, again like how knowing about cars can help if you break down. For a list of great resources, check out my How to Learn About Cryptography blog post.
-
Check and check again: it’s your responsibility to get things right the first time around to the best of your ability rather than relying on peer review. Therefore, I strongly recommend always reading over security sensitive code at least twice and testing it to ensure that it’s operating as expected (e.g. checking the value of variables line by line using a debugger, using test vectors, etc).
-
Peer review is great but often doesn’t happen: unless your project is popular, you have a bug bounty program with cash rewards, or what you’re developing is for an organisation, very few people, perhaps none, will look through the code to find and report vulnerabilities. Similarly, receiving funding for a code audit will probably be impossible.
-
Please don’t create your own custom cryptographic algorithms (e.g. a custom cipher or hash function): this is like flying a Boeing 747 without a pilot license but worse because even experienced cryptographers design insecure algorithms, which is why cryptographic algorithms are thoroughly analysed by a large number of cryptanalysts, usually as part of a competition. By contrast, you rarely see experienced airline pilots crashing planes. The only exception to this rule is implementing something like Encrypt-then-MAC with secure, existing cryptographic algorithms when you know what you’re doing.
-
Please avoid coding existing cryptographic algorithms yourself (e.g. coding AES yourself): cryptographic libraries provide access to these algorithms for you to prevent people from making mistakes that cause vulnerabilities and to offer good performance. Whilst a select few algorithms are relatively simple to implement, like HKDF, many aren’t and require a great deal of experience to implement correctly. Lastly, another reason to avoid doing this is that it’s not much fun since academic papers and reference implementations can be very difficult to understand.
Cryptographic Libraries
Use (in order):
-
Libsodium: a modern, extremely fast, easy-to-use, well documented, and audited library that covers all common use cases, except for implementing TLS. However, it’s much bigger than Monocypher, meaning it’s harder to audit and not suitable for constrained environments, and requires the Visual C++ Redistributable to work on Windows.
-
Monocypher: another modern, easy-to-use, well documented, and audited library, but it’s about half the speed of libsodium on desktops/servers, has no misuse resistant functions (e.g. like libsodium’s secretstream() and secretbox()), only supports Argon2i for password hashing, allowing for insecure parameters (please see the Password Hashing/Password-Based Key Derivation Notes section), and offers no memory locking, random number generation, or convenience functions (e.g. Base64/hex encoding, padding, etc). However, it’s compatible with libsodium whilst being much smaller, portable, and fast for constrained environments (e.g microcontrollers).
-
Tink: a misuse resistant library that prevents common pitfalls, like nonce reuse. However, it doesn’t support hashing or password hashing, it’s not available in as many programming languages as libsodium and Monocypher, the documentation is a bit harder to navigate, and it provides access to some algorithms that you shouldn’t use.
-
LibHydrogen: a lightweight, easy-to-use, hard-to-misuse, and well documented library suitable for constrained environments. The downsides are that it’s not compatible with libsodium whilst also running slower than Monocypher. However, it has some advantages over Monocypher, like support for random number generation, even on Arduino boards, and easy access to key exchange patterns, among other things.
Avoid (in order):
-
A random library (e.g. with 0 stars) on GitHub: assuming it’s not been written by an experienced professional and it’s not a libsodium or Monocypher binding to another programming language, you should generally stay away from less popular, unaudited libraries. They are much more likely to suffer from vulnerabilities and be significantly slower than the more popular, audited libraries. Also, note that even experienced professionals make mistakes.
-
OpenSSL: very difficult to use, let alone use correctly, offers access to algorithms and functions that you shouldn’t use, the documentation is a mess, and lots of vulnerabilities have been found over the years. These issues have led to OpenSSL forks and new, non-forked libraries that aim to be better alternatives if you need to implement TLS.
-
The library available in your programming language: most languages provide access to old algorithms (e.g. MD5 and SHA1) that shouldn’t be used anymore instead of newer ones (e.g. BLAKE2, BLAKE3, and SHA3), which can lead to poor algorithm choices. Furthermore, the APIs are typically easy to misuse, the documentation may fail to mention important security related information, and the implementations will be slower than libsodium. However, certain languages, such as Go and Zig have impressive modern cryptography support.
-
Other popular libraries I haven’t mentioned (e.g. BouncyCastle, CryptoJS, etc): these again often provide or rely on dated algorithms and typically have bad documentation. For instance, CryptoJS uses an insecure KDF called EVP_BytesToKey() in OpenSSL when you pass a string password to AES.encrypt(), and BouncyCastle has no C# documentation. However, this recommendation is too broad really since there are some libraries that I haven’t mentioned that are worth using, like PASETO. Therefore, as a rule of thumb, if it doesn’t include several of the algorithms I recommend in this document, then it’s probably bad. Just do your research and assess the quality of the documentation. There’s no excuse for poor documentation.
-
NaCl: an unmaintained, less modern, and more confusing version of libsodium and Monocypher. For example, crypto_sign() for digital signatures has been experimental for several years. It also doesn’t have password hashing support and is difficult to install/package.
-
TweetNaCl: unmaintained, slower than Monocypher, doesn’t offer access to newer algorithms, doesn’t have password hashing, and doesn’t zero out buffers.
Notes:
-
If the library you’re currently using/planning to use doesn’t support several of the algorithms I’m recommending, then it’s time to upgrade and take advantage of the improved security and performance benefits available to you if you switch.
-
Please read the documentation: don’t immediately jump into coding something because that’s how mistakes are made. Good libraries have high quality documentation that will explain potential security pitfalls and how to avoid them.
-
Some libraries release unauthenticated plaintext when using AEADs: for example, OpenSSL and BouncyCastle apparently do. Firstly, don’t use these libraries for this reason and the reasons I’ve already listed. Secondly, never do anything with unauthenticated plaintext; ignore it to be safe.
-
Older doesn’t mean better: you can argue that older algorithms are more battle tested and therefore proven to be a safe choice, but the reality is that most modern algorithms, like ChaCha20, BLAKE2, and Argon2, have been properly analysed at this point and shown to offer security and performance benefits over their older counterparts. Therefore, it doesn’t make sense to stick to this overly cautious mindset of avoiding newer algorithms, except for algorithms that are still candidates in a competition (e.g. new post-quantum algorithms), which do need further analysis to be considered safe.
-
You should prioritise speed: this can make a noticeable difference for the user. For example, a C# Argon2 library is going to be significantly slower than Argon2 in libsodium, meaning unnecessary and unwanted extra delay during key derivation. Libsodium is the go-to for speed on desktops/servers, and Monocypher is the go-to for constrained environments (e.g. microcontrollers).
Symmetric Encryption
Use (in order):
-
XChaCha20-then-BLAKE2b (Encrypt-then-MAC): if you know what you are doing, then implementing Encrypt-then-MAC offers better security than an AEAD because it provides better security properties, such as key commitment, and allows for a longer authentication tag, making it more suitable for long-term storage. This combo is now being employed by PASETO, an alternative to JWT, as well as my file encryption software called Kryptor. ChaCha20 has a higher security margin than AES whilst also being fast in software and running in constant time, meaning it’s not vulnerable to timing attacks like AES can be. Moreover, Salsa20, the cipher ChaCha20 was based on, underwent rigorous analysis as part of the eSTREAM competition, making it into the final portfolio. Salsa20 has also received further analysis since then.
-
XChaCha20-Poly1305: this is the gold standard for when you don’t know how to implement Encrypt-then-MAC or need maximum performance on all devices. As mentioned above, ChaCha20 has a higher security margin than AES, always runs in constant time, and (X)ChaCha20-Poly1305 is faster than AES-GCM without AES-NI hardware support. Note that XChaCha20-Poly1305 should be favoured over regular ChaCha20-Poly1305 in many cases because it allows for random nonces, which helps prevent nonce reuse (please see point 1 of the Notes section). If you just need a counter nonce or intend to use a unique key for encryption each time, then ChaCha20-Poly1305 is fine. Unfortunately, there are two ChaCha20-Poly1305 constructions - the original ChaCha20-Poly1305 and ChaCha20-Poly1305-IETF. The original construction is arguably better because it has a smaller nonce, meaning it doesn’t encourage unsafe random nonces, and a larger internal counter, meaning it can encrypt more data using the same key and nonce pair (please see point 5 of the Notes section), but the IETF variant is more popular and should therefore almost always be used.
-
XSalsa20-Poly1305: although (X)ChaCha20 has slightly better diffusion and performance and has seen more adoption in recent years, (X)Salsa20 is, practically speaking, just as secure, with the same benefits as (X)ChaCha20 over AES (please see points 1 and 2). It has been the recipient of lots of cryptanalysis (more than ChaCha20) and is still considered one of the best alternatives to AES.
-
AES-CTR (or CBC)-then-HMAC (Encrypt-then-MAC): again, if you know what you are doing, this is superior to using an AEAD in terms of security for the reasons outlined in point 1 above. AES-CTR should be preferred because AES-CBC is less efficient, requires padding, doesn’t support a counter nonce, and can’t encrypt as many blocks before a collision occurs. However, both AES-CTR-then-HMAC and AES-CBC-then-HMAC can be faster than AES-GCM without AES-NI hardware support. With that said, generating an IV for CBC and CTR can be a source of trouble, with CBC requiring unpredictable (aka random) IVs and CTR implementations differing in terms of nonce size and whether a random/counter nonce is safe.
-
AEGIS-256: one of the finalists for the CAESAR competition. It’s much faster than AES-GCM and (X)ChaCha20-Poly1305 with hardware support, expected to be key committing, and supports safe random nonces. In Zig, it even performs better than (X)ChaCha20-Poly1305 and AES-GCM without hardware support. However, it’s not compactly committing because of the short 128-bit tag, so Encrypt-then-MAC is still preferable for security. It has also received little adoption at the time of writing and isn’t available in many cryptographic libraries. With that said, it will be in the next release of libsodium (1.0.0.19-stable) and hopefully standardised given the advantages over AES-GCM.
-
AES-OCB: another one of the finalists for the CAESAR competition. It performs well compared to AES-GCM and (X)ChaCha20-Poly1305 with hardware support, supports random nonces, has been researched for over a decade, the design is efficient and timing-attack resistant (assuming the block cipher implementation is), and it’s available in some cryptographic libraries. However, it’s slower than AEGIS and not key committing.
-
AES-GCM: the industry standard despite it not being the best and receiving various criticism. It’s easier to use correctly than Encrypt-then-MAC and faster than (X)ChaCha20-BLAKE2b, (X)ChaCha20-Poly1305, XSalsa20-Poly1305, and AES-CTR-then-HMAC/AES-CBC-then-HMAC with AES-NI hardware support, but it’s slow without hardware support, it has a weird nonce size (96 bits) that means you should use a counter nonce, some implementations incorrectly allow 128-bit nonces (only use a 96-bit nonce since longer nonces get hashed, which could result in multiple nonces producing some of the same AES-CTR output), reusing a nonce is more catastrophic than in AES-CBC for example, and there are relatively small max encryption limits for a single key (e.g. ~350 GB when using 16 KB long messages). Furthermore, there can be side-channels in software implementations and mitigating them reduces the speed of the algorithm. Therefore, AES-GCM should only be used when there’s hardware support, although I strongly recommend the above algorithms instead regardless.
Avoid (not in order because they’re all bad):
-
Your own custom symmetric encryption algorithm: even experienced cryptographers design insecure algorithms, which is why cryptographic algorithms are thoroughly analysed by a large number of cryptanalysts, usually as part of a competition.
-
AES-ECB: identical plaintext blocks get encrypted into identical ciphertext blocks, which means the algorithm lacks diffusion and fails to hide data patterns. In other words, it’s horribly insecure in the vast majority of contexts.
-
RC4: there are lots of attacks against it, rendering it horribly insecure.
-
Unauthenticated AES-CBC, AES-CTR, ChaCha20, and other unauthenticated ciphers without a MAC: this allows an attacker to tamper with the ciphertext without detection and can sometimes allow for other attacks, like padding oracle attacks in the case of AES-CBC.
-
One-time pad: completely impractical since the key needs to be the same size as the message, and a true random number generator (e.g. atmospheric noise) is required to generate the keystream for it to be impossible to decrypt. Furthermore, some people incorrectly assume an XOR cipher with a repeating key is equivalent to a one-time pad, but this is horribly insecure. Never do this.
-
Kuznyechik: it has a flawed S-Box, with no design rational ever being made public, which is likely a backdoor. This algorithm is available in VeraCrypt, but I’ve luckily not seen it used anywhere else. Never use it or any program/protocol relying on it.
-
Blowfish, CAST-128, GOST, IDEA, 3DES, DES, RC2, and any cipher with a 64-bit block size: a 64-bit block size means collision attacks can be performed after encrypting a certain amount of data using the same key. Don’t use any algorithm with a block size less than 128 bits. Using algorithms with an even larger block size (e.g. ChaCha20 and Salsa20, which are stream ciphers that operate using 512-bit blocks) is even more preferable because a 128-bit block size can still lead to collisions eventually. Algorithms like DES and 3DES are also very old and have small key sizes that are insecure (please see the Symmetric Key Size section).
-
AES-CCM, AES-EAX, AES-CFB, AES-OFB, Serpent, Threefish, Twofish, Camellia, RC6, ARIA, SEED, and other ciphers nobody uses: very few people use these because they’re worse in one way or another. For example, AES-CCM uses MAC-then-Encrypt and CBC-MAC, AES-EAX is slower than AES-GCM and uses OMAC, AES-OFB can be insecure since two messages can end up using the same keystream, some of them are unbalanced in terms of security to performance (e.g. Serpent is slower whilst having a high security margin), some have received limited cryptanalysis, and implementations of uncommon non-AES algorithms are very rare in mainstream cryptographic libraries, with random implementations found on GitHub being less likely to be secure because these types of algorithms can be hard to implement correctly.
-
AES-XTS, AES-XEX, AES-LRW AES-CMC, AES-EME, and other wide block/disk encryption only modes: these are not suitable for encrypting data in transit. They should only be used for disk encryption, with AES-XTS being preferred since it’s popular, more secure than some other disk encryption modes, less malleable than AES-CBC and AES-CTR (tampering causes random, unpredictable changes to the plaintext), and ordinary authentication using an AEAD or Encrypt-then-MAC cannot be used for disk encryption because it would require extra storage and slow down read/write speeds, among other things.
-
MORUS, Ascon, ACORN, Deoxys-II, COLM, and non-finalist CAESAR competition ciphers: MORUS doesn’t provide the expected security level, non-finalists should generally never be used, and these finalists are all essentially unavailable in cryptographic libraries. By contrast, AEGIS-256 and AES-OCB have gained some traction, which is why I’m now recommending them.
-
Rocca: extremely fast, key committing, and supports safe random nonces, but it hasn’t received proper analysis yet since it’s a new scheme.
-
AES-GCM-SIV and AES-SIV: these don’t provide unlimited protection against nonce reuse like some people believe, they’re slower than regular AES-GCM, they’re rarely available in cryptographic libraries, they rely on Mac-then-Encrypt, and AES-SIV uses CMAC and takes a larger key. If you’re concerned about nonces repeating, then you should use XChaCha20-then-MAC, XChaCha20-Poly1305, or AEGIS-256 with a randomly generated nonce or a nonce derived alongside a subkey for encryption using a KDF or MAC, as described here. If that isn’t possible for some reason, then use AES-GCM-SIV.
Notes:
-
Never reuse a nonce/IV with the same key (e.g. never hardcode a nonce/IV): doing so is catastrophic to security. You must either use a counter nonce, a KDF generated nonce/IV, or a randomly generated nonce/IV, depending on the algorithm you’re using. For instance, you should use a counter nonce (e.g. starting with 12 bytes of zeroes) with ChaCha20-Poly1305 and AES-GCM because the small nonce size (64- or 96 bits) means random nonces are not safe unless you’re encrypting a small amount of data per key. By contrast, you can use a random or counter nonce safely with XChaCha20-Poly1305 because it has a large nonce size (192 bits). Then AES-CBC requires an unpredictable (aka random) 128-bit IV, and some implementations of AES-CTR need a random nonce too, although most involve using a 64- or 96-bit counter nonce for the reasons explained above. Note that if you always rotate the key before encrypting (never encrypting anything with the same key more than once), then you can get away with using a nonce full of zeroes (e.g. 12 bytes of zeroes for AES-GCM), but I generally wouldn’t recommend doing this, especially if you have to use a 128-bit key, which I again don’t recommend (please see the Symmetric Key Size section), since this can lead to multi-target attacks.
-
Prepend the nonce/IV to the ciphertext: this is the recommended approach because it’s read before the ciphertext and doesn’t need to be kept secret. However, if you’re performing key wrapping (encrypting a key using another key), as described in point 6 below, then you could encrypt the nonce/IV too as an additional layer of protection.
-
Never use string variables for keys, nonces, IVs, and passwords: these parameters should always be byte arrays. String keys are essentially just passwords, meaning they’re not suitable for use as keys directly (please see the Password Hashing/Password-Based Key Derivation section). Furthermore, strings are immutable (unchangeable) in many programming languages (e.g. C#, Java, JavaScript, Go, etc), meaning they can’t be zeroed out from memory (please see point 7 below).
-
Avoid encryption functions/APIs that include a password parameter: these often use dated or insecure password-based KDFs that shouldn’t be used. Instead, use one of the recommended password-based KDFs (please see the Password Hashing/Password-Based Key Derivation section) yourself to derive an encryption key for an AEAD or an encryption key and MAC key for Encrypt-then-MAC.
-
Ciphers have limits on the amount of data they can safely encrypt using a single key: for AES-GCM, you can encrypt ~64 GB using a key and nonce pair for one message (don’t reuse the nonce, as explained in point 1 above) and ~350 GB (assuming 16 KB messages) with a single key. For ChaCha20-Poly1305-IETF, you can encrypt 256 GB using a key and nonce pair for one message, but there’s no practical limit for a single key (2^64 bytes). XChaCha20-Poly1305 and non-IETF ChaCha20-Poly1305 have no practical limits (~2^64 bytes). Then with AES-CTR, you can encrypt ~2^64 bytes, and with AES-CBC, you can encrypt ~2^47 bytes. Make sure you follow the recommendations below to ensure that these limits are never reached.
-
Ideally, use a new key for each message (except when chunking the same message, as explained in point 8 below): this helps prevent cryptographic wear-out (using a single key to encrypt too much data), nonce reuse, and reusing keys with multiple algorithms whilst being beneficial for security in that a compromise of one key doesn’t compromise data encrypted under different keys. One common way of doing this is to randomly generate a unique data encryption key (DEK) for each message, encrypt the DEK using a key encryption key (KEK) derived using a key derivation function (KDF), and then prepend the encrypted DEK to the ciphertext. For decryption, you derive the KEK, use it to decrypt the encrypted DEK, and use the DEK to decrypt the ciphertext. Alternatively, you can derive unique keys using a random salt with a KDF, although this is inefficient when using a password-based KDF since it means a delay for every message.
-
Erase secret keys from memory as soon as possible: once you’ve finished using a secret key, it should be zeroed out from memory to prevent an attacker with physical or remote access to a machine being able to retrieve it. Note that in garbage collected programming languages, such as C#, Go, and JavaScript, this is difficult to achieve because the garbage collector can copy secrets around in memory. Locking memory via an external library can solve this problem. Also, always disable compiler optimisations for the zero memory method. Even without locking memory, attempting to erase sensitive data from memory is better than doing nothing.
-
Encrypt large amounts of data in (16-64 KiB) chunks: this lowers memory usage, can be faster for Encrypt-then-MAC, allows for more encryptions under the same key with AEADs, reduces theoretical attack boundaries for AEADs, means that a corruption in a ciphertext might only affect one chunk rather than rendering the entire message unrecoverable, and enables the detection of tampered chunks before an entire message is sent or read. However, this is tricky to get right because you need to add and remove padding in the last chunk (e.g. using an encrypted header to store the length of padding or a padding scheme, as explained in point 13 below) and prevent chunks from being truncated (e.g. using the total ciphertext length as additional data), reordered, duplicated, or removed (e.g. using a counter nonce that’s incremented for each chunk), so you should ideally use or replicate an existing API, like secretstream() in libsodium.
-
Don’t just use a standardised AEAD (AES-GCM, (X)ChaCha20-Poly1305, XSalsa20-Poly1305, AES-GCM-SIV, AES-OCB, etc) if you’re performing password-based encryption in an online scenario: most AEADs are not key committing, meaning they are susceptible to partitioning oracle attacks. In summary, an attacker can generate a ciphertext that successfully decrypts under multiple different keys. By recursively submitting such a ciphertext to an oracle (a server that knows the key and returns an error), an attacker can guess a large number of passwords at once, speeding up a password search. To solve this problem, you can either use Encrypt-then-MAC following the instructions later on in this Notes section or apply a fix for a non-committing AEAD. There are currently no standardised committing AEADs, and they would not be truly committing unless the tag was large enough to be collision-resistant (e.g. 256 bits), which is why Encrypt-then-MAC is still preferable. The simplest mitigation involves hashing the key and prepending the hash to the ciphertext, but this leaks the identity of the key unless you include a salt. The fix I’d recommend involves deriving an encryption key and a MAC key using a KDF, encrypting the message using the AEAD with the encryption key, retrieving the authentication tag from the end of the ciphertext, and prepending a MAC of the encryption key, nonce, and AEAD authentication tag to the ciphertext (e.g.
HMAC(message: encryptionKey || nonce || tag, key: macKey)
). For decryption, you derive the encryption key and MAC key again, read the AEAD authentication tag, and verify the MAC in constant time (see point 18 below) before decrypting the message using the AEAD. An example of this fix can be found here. -
Standardised AEADs (AES-GCM, (X)ChaCha20-Poly1305, XSalsa20-Poly1305, AES-GCM-SIV, AES-OCB, etc) aren’t key or compactly committing: if an algorithm is key committing, then an attacker cannot generate a ciphertext that successfully decrypts using multiple keys. If an algorithm is compactly committing, then someone cannot find two different messages and two different encryption keys that lead to the same tag. These are properties often intuitively expected from AEADs, and the lack of these properties can cause rare problems such as partitioning oracle attacks for password-based encryption in some online scenarios, deanonymisation when using hybrid encryption in some online scenarios, invisible salamander attacks in some online scenarios, and decryption to different but valid plaintexts. To fix this problem, you should use Encrypt-then-MAC as explained in these guidelines or apply the fix for AEADs outlined above in point 9.
-
Make use of the additional data parameter in AEADs: this parameter is useful for binding context information to a ciphertext and preventing issues like replay attacks and confused deputy attacks. It’s often used to authenticate things like headers, version numbers, timestamps, and message counters. Note that additional data is not part of the ciphertext; it’s just information included in the computation of the authentication tag. You either need to store additional data securely in some sort of database (e.g. in the case of a user’s email address being used as additional data) or be able to reproduce the additional data when it’s time for decryption (e.g. using a file name as additional data).
-
If an attacker knows the encryption key, then they can still decrypt an AEAD encrypted message without knowing the additional data: for example, they can use AES-CTR with the key to decrypt an AES-GCM encrypted message, ignoring the authentication tag and additional data.
-
Pad messages before encryption if you want to hide their length: stream ciphers, such as ChaCha20 and AES-CTR (used in AES-GCM), don’t perform any padding, meaning the ciphertext is the same length as the plaintext. This generally isn’t a concern for most applications, but when it is, you should use ISO/IEC 7816-4 padding on the message before encryption and remove the padding after decryption. This padding scheme is more resistant to some types of attacks than other padding algorithms and always reversible, unlike zero padding. Such padding can be randomised or deterministic, with both techniques having pros and cons. Randomised padding is typically better for obscuring the usage of cryptography (e.g. making an encrypted file look like random data). Encrypting data in chunks, as described in point 8 above, is an example of deterministic padding since the last chunk will always be padded to the size of a chunk. PADMÉ padding is another type of deterministic padding with minimal storage overhead, but it doesn’t pad small messages.
-
Encrypt-then-Compress is pointless and Compress-then-Encrypt can leak information: high-entropy (random) data can’t be compressed, and Compress-then-Encrypt can leak the compression ratio of the plaintext, which led to the CRIME and BREACH attacks on TLS.
-
Stick to Encrypt-then-MAC: don’t MAC-then-Encrypt or Encrypt-and-MAC because both can be susceptible to attacks, whereas Encrypt-then-MAC is always secure when implemented correctly. Encrypt-then-MAC is the standard approach and is what’s used in non-SIV (aka most) AEADs. The only exception to this rule is when implementing an SIV AEAD to have nonce-misuse resistance, but you should ideally let a library do that for you.
-
Always use separate keys for authentication and encryption: this is considered good practice, even though reusing the same key may be theoretically fine. In the case of a password-based KDF, this can be done by using a larger output length (e.g. 96 bytes) and splitting the output into two keys (e.g. 256-bit and 512-bit). In the case of a non-password-based KDF, you can use the KDF twice with the same input keying material but different context information and output lengths for domain separation. Please see the Symmetric Key Size section for details on what key size you should use for encryption and MACs.
-
Always MAC the nonce/IV and everything in the message (e.g. file headers too): if you fail to authenticate the nonce/IV, then an attacker can tamper with it undetected. AEADs always authenticate the nonce for this reason.
-
Always compare secrets and MACs in constant time: if you don’t compare the authentication tags in constant time, then this can lead to timing attacks that allow an attacker to calculate a valid tag for a forged message. Libraries like libsodium have constant time comparison functions that you can use to prevent this.
-
Concatenating multiple variable length parameters when using a MAC (e.g.
HMAC(message: additionalData || ciphertext, key: macKey)
) can lead to attacks: please see point 5 of the Message Authentication Codes Notes section. -
Cipher agility is harmful: less is more in the case of supporting multiple ciphers/algorithms because more choices means more can go wrong, which is one reason why WireGuard is regarded as superior to OpenVPN and TLS 1.3 supports fewer algorithms than TLS 1.2. Cipher agility has caused serious problems, like in the case of JWTs. Also, in the case of programs like GPG and VeraCrypt, customisation allows the user to worsen their security. Therefore, choose one secure Encrypt-then-MAC combo or AEAD recommended above, and that’s it. If the algorithm you chose gets broken, which is extremely unlikely if you’re following these guidelines, then you can just increment the protocol/format version number and switch to a different algorithm.
-
Cascade encryption is unnecessary: although I’ve written a cascade encryption library based on TripleSec called DoubleSec, cascade encryption is significantly slower and solves a problem that pretty much doesn’t exist because algorithms like ChaCha20 and AES are nowhere near broken and other issues are more likely to cause problems. Furthermore, it’s a hassle to implement yourself compared to using a single algorithm, with more things that can go wrong. Therefore, unless you’re extremely paranoid (e.g. in an Edward Snowden type situation) and don’t care about speed at all, please don’t bother.
Discussion:
Not everyone will agree with my recommendation to use Encrypt-then-MAC over AEADs when possible for the following reasons:
-
It’s easier to implement an AEAD: you don’t need to worry about deriving separate keys, appending and removing the authentication tag, and comparing authentication tags in constant time. AEADs also make it easy to use additional data in the calculation of the authentication tag. This should mean fewer mistakes.
-
AEADs are typically faster: AES-GCM with AES-NI instruction set support is very fast, AES-OCB, AEGIS, and Rocca are even faster, and ChaCha20-Poly1305 is also fast without any reliance on hardware support.
-
It’s easier to chunk data with an AEAD: Encrypt-then-MAC normally involves encrypting all the data in one go and appending one authentication tag at the end, which requires loading the entire message into memory and means a corruption renders the entire message unrecoverable. Whilst you can also do this with AEADs, it’s recommended to chunk messages, as explained in point 8 of the Notes, meaning the ciphertext contains multiple authentication tags. This is trickier with Encrypt-then-MAC unless you’re using a library that offers it as a function.
My response to these arguments is:
-
Yes, AEADs are simpler, which is exactly why we need committing AEADs and Encrypt-then-MAC implementations to be standardised and included in cryptographic libraries. Unfortunately, this isn’t happening because everyone is busy promoting non-committing AEADs.
-
Whilst this is often true, except for AEADs like AES-GCM without AES-NI support, Encrypt-then-MAC, especially using MACs like BLAKE2b and BLAKE3, is not slow enough for this to be considered a serious problem, particularly in non-interactive/offline scenarios or when dealing with long-term storage. In fact, using BLAKE3 with a large enough amount of data can be faster than Poly1305 and GMAC. Moreover, I would argue that the additional security makes up for any loss in speed. AEADs are not designed for long-term storage, as indicated by the small nonces and tags, whereas Encrypt-then-MAC is.
-
This is another reason why Encrypt-then-MAC implementations like (X)ChaCha20-BLAKE2b should be included in cryptographic libraries. If they were, then you could call them like any other AEAD. For instance, I made ChaCha20-BLAKE2b and ChaCha20-BLAKE3 libraries to allow me to do this.
So when should you use an AEAD? Exceptions to my Encrypt-then-MAC recommendation include when:
-
Maximum performance is necessary: for example, in online scenarios where there’s a large key space (e.g. passwords aren’t being used) and data is not being stored long-term, such as TLS 1.3 and WireGuard. This is what AEADs are designed for. However, with non-committing AEADs and a small key space in an online scenario, things like partitioning oracle attacks and deanonymisation may be possible.
-
You’re not comfortable implementing Encrypt-then-MAC: if there’s no decent library you can use (e.g. Tink isn’t available in your programming language) or copy code from (make sure you respect the code license!), then you’re more likely to implement an AEAD correctly. However, implementing the fix I recommend for partitioning oracle attacks (please see point 9 of the Notes), which affect online password-based encryption scenarios, requires knowing how to use a MAC, so at that point, you may as well use Encrypt-then-MAC, especially if you’re storing data long-term. With enough research and attention to detail, Encrypt-then-MAC can be implemented correctly by anyone.
Message Authentication Codes
Use (in order):
-
Keyed BLAKE2b-256 or keyed BLAKE2b-512: faster than HMAC and SHA3, yet as real-world secure as SHA3. Furthermore, BLAKE2 relies on essentially the same core algorithm as BLAKE, which received a significant amount of cryptanalysis, even more than Keccak (the SHA3 finalist), as part of the SHA3 competition. Despite being one of the best candidates on paper, it didn’t win the SHA3 competition because the design was more similar to that of SHA2. However, this is arguably a good thing since SHA2 is still secure after many years of cryptanalysis. Lastly, it’s available in many cryptographic libraries and has become increasingly popular in software (e.g. it’s used in Argon2 and many other password hashing schemes).
-
HMAC-SHA256 or HMAC-SHA512: slower and older than BLAKE2 but well-studied. HMAC-SHA2 is also faster than SHA3, extremely popular in software, and available in about every cryptographic library. However, unlike BLAKE, BLAKE2, BLAKE3, and SHA3, SHA2 was designed behind closed doors at the NSA rather than the result of an open competition, with no design rationale in the standard.
-
SHAKE256: this is faster than regular SHA3 and similar in speed to SHA2, which is why it’s being recommended for most applications over regular SHA3 by the Keccak (SHA3) team and in the go/x/crypto documentation. Using it as a MAC requires concatenating a fixed length key and the message to authenticate (
SHAKE256(key || message)
). Use a 256-bit output length to get an equivalent security level to SHA3-256 and SHA256. A 512-bit output length provides 256-bit collision resistance but only 256-bit preimage and second preimage resistance still, which is less than SHA3-512 and SHA512, not that this is a practical concern since 2^256 is impossible to reach. -
KMAC256, SHA3-256, or SHA3-512: SHA3 is slower in software than BLAKE2, BLAKE3, SHA2, HMAC-SHA2, and SHAKE but has a higher security margin and is fast in hardware. If KMAC256 is available in your cryptographic library, then you should use it with a 256-bit or 512-bit output length (please see point 3 above since the same security level applies) because it’s like HMAC for SHA3. Otherwise, you should perform concatenation of the fixed length key and message (
SHA3(key || message)
) to construct a SHA3 MAC because HMAC-SHA3 is needlessly inefficient since SHA3 is already a MAC. The worse performance and less accessible variants make it hard to recommend over HMAC-SHA2. -
Keyed BLAKE3-256: faster than BLAKE2, SHA2, SHAKE, and SHA3, but it has a smaller security margin, only targets the 128-bit security level, and isn’t available in many cryptographic libraries. Therefore, I’d only recommend this when speed is of utmost importance because it’s not conservative.
Avoid (not in order because they’re all bad):
-
Regular, unencrypted hashes (e.g.
SHA256(ciphertext)
): this is insecure because unkeyed hashes don’t provide authentication. -
Regular, encrypted hashes (e.g.
AES-CTR(SHA256(ciphertext))
): this is insecure. For example, with a stream cipher, you could flip bits in the ciphertext hash. -
SHA2(key || message)
: this is vulnerable to length extension attacks, as discussed in point 3 of the Hashing Notes section. Technically speaking,SHA2(message || key)
works as a MAC if the attacker doesn’t know the key, but it’s weaker than constructions like HMAC because it requires the hash function to be collision-resistant rather than a pseudorandom function and therefore shouldn’t be used. Newer hash functions, like BLAKE2, SHA3, and BLAKE3, are resistant to length extension attacks and could be used to performHash(key || message)
safely, but you should still just use a keyed hash function when possible to do the work for you. -
Meow Hash: this is insecure, as explained by this cryptanalysis blog post.
-
HMAC-MD5 and HMAC-SHA1: MD5 and SHA1 should no longer be used for anything.
-
Poly1305 and other polynomial MACs: these are easier to misuse than the recommended algorithms (e.g. Poly1305 requires a secret, unique, and unpredictable key each time that’s independent from the encryption key). They also produce small tags that are designed for online protocols and small messages.
-
CBC-MAC: this is unpopular and often implemented incorrectly because it has weird requirements that most people are probably completely unaware of, allowing for attacks. Even when implemented correctly, the recommended algorithms are better.
-
CMAC/OMAC: almost nobody uses this, even though it improves on CBC-MAC in terms of preventing mistakes. Furthermore, it only produces a 128-bit tag.
-
128-bit keyed hashes or HMACs: you shouldn’t go below a 256-bit output length with hash functions because a 128-bit security level should be the minimum, and 128-bit authentication tags only provide 64-bit collision resistance.
-
SHAKE128 and KMAC128: these only provide at best 128-bit preimage and second preimage resistance regardless of the output length, which is lower than a typical 256-bit hash. Therefore, use SHAKE256/KMAC256 with a 256-bit or 512-bit output length to obtain 256-bit preimage and second preimage resistance.
-
Keyed BLAKE2s: in most cases, you’ll want to use BLAKE2b, which is faster on 64-bit platforms, does more rounds, and can produce larger digests. Only use BLAKE2s if you’re hashing on 8- to 32-bit platforms since that’s what it’s designed for.
Notes:
-
Please read points 15-18 of the Symmetric Encryption Notes for guidance on implementing a MAC correctly.
-
Please read point 2 of the Symmetric Key Size Use section for guidance on what key size to use.
-
A 256-bit authentication tag is sufficient for most use cases: however, a 512-bit tag provides additional security if you’re concerned about quantum computing. I wouldn’t recommend bothering with an output length in-between (e.g. HMAC-SHA384) because that’s not common, and you may as well go all the way to get a 256-bit security level.
-
Append the authentication tag to the ciphertext: this is common practice and how AEADs operate.
-
Concatenating multiple variable length parameters (e.g.
HMAC(message: additionalData || ciphertext, key: macKey)
) can lead to attacks: if you fail to concatenate the lengths of the parameters (e.g.HMAC(message: additionalData || ciphertext || additionalDataLength || ciphertextLength, key: macKey)
, with the lengths converted to a fixed number of bytes, such as 4 bytes to represent an integer, consistently in either big- or little-endian, regardless of the endianness of the machine), then your implementation will be susceptible to canonicalization attacks because an attacker can shift bytes in the different parameters whilst producing a valid authentication tag. AEADs do this length concatenation for you to prevent this. Another potentially more efficient method of safely supporting multiple inputs is to iteratively MAC each input, using the previous tag as the key for the next MAC calculation and a different random key to MAC the final tag to prevent length extension attacks. This alternative is outlined here, and I have created a small library called MultiMAC that does this if you need further help.
Symmetric Key Size
Use (not in order because they have different use cases):
-
256-bit keys: there’s essentially no reason not to use 256-bit keys for symmetric encryption. With AES-128, a 128-bit key doesn’t necessarily translate to 128-bit security due to batch attacks, this is the only available key size for most (X)ChaCha20 and (X)Salsa20 implementations, it’s the key size that’s used for top secret material by intelligence agencies and governments, and it’s now recommended for long-term storage due to concerns surrounding quantum computers being able to bruteforce 128-bit keys.
-
512-bit keys: it’s recommended to always use a key size as large as the output length for HMAC (e.g. a 512-bit key for HMAC-SHA512). This principle is a good rule to follow for MACs in general as it ensures that the key size doesn’t decrease the security provided by the output length. However, you can use a larger key size (e.g. a 512-bit key with a 256-bit output length) for domain separation when deriving keys.
Avoid (in order):
-
Smaller than 128-bit keys: this won’t stand the test of time and in some cases can already be bruteforced.
-
Symmetric encryption algorithms with large key sizes (e.g. Threefish): key sizes over 256-bit are widely regarded as unnecessary because they provide no practical security benefit. Furthermore, encryption algorithms supporting such key sizes are unpopular in practice.
-
128-bit keys: this is the minimum and provides better performance, but please just use 256-bit keys because they provide a higher security margin. AES-128 is less secure than AES-256 because it’s considerably faster to bruteforce, especially when you consider batch attacks. The argument that AES-128 is more secure than AES-256 due to certain attacks being more effective on AES-256 is incorrect because such attacks are not practical in the real world.
-
HMAC keys larger than the hash function block size (e.g. > 512 bits with HMAC-SHA256 and > 1024 bits with HMAC-SHA512): this causes the key to get hashed down to the output length of the hash function, which ironically reduces security compared to using a key as large as the block size.
Notes:
-
Symmetric keys must be kept secret: unlike with public-key cryptography, where you can share the public key safely, you must not share a symmetric key via an insecure (e.g. unencrypted) channel.
-
Keys must be uniformly random: they can either be randomly generated using a cryptographically secure pseudorandom number generator (please see the Random Numbers section) or derived using one of the recommended key derivation or password-based key derivation functions (please see the (Non-Password-Based) Key Derivation Functions and Password Hashing/Password-Based Key Derivation sections).
Random Numbers
Use (in order):
-
The cryptographically secure pseudorandom number generator (CSPRNG) in your programming language or cryptographic library: these should use the CSPRNG in your operating system. For example, RandomNumberGenerator() in C#, SecureRandom() in Java, Crypto.getRandomValues() in JavaScript, and so on.
-
A fast-key-erasure userspace RNG: this should be a last resort because it’s hard to collect entropy properly. A lot can go wrong if you don’t know what you’re doing. On embedded devices, allow a library like LibHydrogen to handle random number generation for you.
Avoid (not in order because they’re both bad):
-
A non-cryptographically secure pseudorandom number generator: for example, Math.random() in JavaScript, Random.Next() in C#, Random() in Java, and so on. These are not secure and should not be used for anything related to security.
-
A custom RNG: this is likely going to be insecure because it’s harder to do properly than you’d think. Just trust the CSPRNG in your operating system.
Notes:
- Ideally, generate 256-bit random values for IDs, salts, etc: this reduces the chances of a collision into the realm of not having anything to worry about. By contrast, random 128-bit values will collide after 2^64 due to the birthday paradox.
Hashing
Use (in order):
-
BLAKE2b-512 or BLAKE2b-256: faster than MD5, SHA1, SHA2, and SHA3, yet as real-world secure as SHA3. Furthermore, BLAKE2 relies on essentially the same core algorithm as BLAKE, which received a significant amount of cryptanalysis, even more than Keccak (the SHA3 finalist), as part of the SHA3 competition. Despite being one of the best candidates on paper, it didn’t win the SHA3 competition because the design was more similar to that of SHA2. However, this is arguably a good thing since SHA2 is still secure after many years of cryptanalysis. Lastly, it’s available in many cryptographic libraries and has become increasingly popular in software (e.g. it’s used in Argon2 and many other password hashing schemes).
-
SHA512, SHA512/256, or SHA256: SHA2 is the most popular hash function, meaning it’s widely available in cryptographic libraries, it’s still secure after many years of cryptanalysis besides length extension attacks (please see point 3 of the Notes section), and it offers decent performance.
-
SHAKE256: this is faster than regular SHA3 and similar in speed to SHA2, which is why it’s being recommended for most applications over regular SHA3 by the Keccak (SHA3) team and in the go/x/crypto documentation. Use a 256-bit output length to get an equivalent security level to SHA3-256 and SHA256. A 512-bit output length provides 256-bit collision resistance but only 256-bit preimage and second preimage resistance still, which is less than SHA3-512 and SHA512, not that this is a practical concern since 2^256 is impossible to reach.
-
SHA3-512 or SHA3-256: slow in software, but the new standard, fast in hardware, has a flexible construction that has been used to build other algorithms, well analysed, very different to SHA2, and has a higher security margin than the other algorithms listed here.
-
BLAKE3-256: the fastest cryptographic hash in software at the cost of having a lower security margin and being limited to a 128-bit security level. It’s also rarely available in cryptographic libraries. However, it improves on BLAKE2 in that there’s only one variant that covers all use cases (it’s a regular hash, PRF, MAC, KDF, and XOF), but depending on the cryptographic library you use, this probably isn’t something you’ll notice when using BLAKE2b anyway. I’d only recommend this when speed is of utmost importance because it’s not conservative.
Avoid (not in order because they’re all bad):
-
Non-cryptographic hash functions and error-detecting codes (e.g. CRC): the clue is in the name. These are not secure.
-
MD5 and SHA1: both are very old and no longer secure. For instance, there’s an attack that breaks MD5 collision resistance in 2^18 time, which takes less than a second to execute on an ordinary computer.
-
Streebog: it has a flawed S-Box, with no design rational ever being made public, which is likely a backdoor. This algorithm is available in VeraCrypt, but I’ve luckily not seen it used anywhere else. Never use it or any program/protocol relying on it.
-
Insecure and non-finalist SHA3 competition candidates (e.g. EDON-R): if you want to use something from the SHA3 competition, then you should either use BLAKE2b (based on BLAKE, which was thoroughly analysed and deemed to have a very high security margin), SHA3 (the winner, very different to SHA2 in design, and has a very high security margin), or BLAKE3 (based on BLAKE2 but faster and with a lower security margin).
-
Chaining hash functions (e.g.
SHA256(SHA1(message))
): this can be insecure (e.g. SHA1 has worse collision resistance than SHA256, meaning a collision for SHA1 results in a collision forSHA256(SHA1(message))
) and is obviously less efficient than hashing once. Just don’t do this. -
RIPEMD, RIPEMD-128, RIPEMD-160, RIPEMD-256, and RIPEMD-360: the original RIPEMD has collisions and RIPEMD-128 has a small output size, meaning they’re insecure. Then the longer variants are still old, unpopular, most implementations are limited to small output lengths (e.g. 160-bit is the most common), and they have worse performance and have received less analysis compared to the recommended algorithms.
-
Whirlpool, SHA224, MD6, and other hashes nobody uses: these are all worse in one way or another than the recommended algorithms, which is why nobody uses them. For instance, Whirlpool is slower than most other cryptographic hash functions, SHA224 only provides 112-bit collision resistance, which is below the recommended 128-bit security level, MD6 didn’t make it to the second round of the SHA3 competition and has speed issues, and so on.
-
128-bit hashes: you shouldn’t go below a 256-bit output with hash functions to ensure 128-bit security. 128-bit hashes only provide a 64-bit security level.
-
SHAKE128: this only provides at best 128-bit preimage and second preimage resistance regardless of the output length, which is lower than a typical 256-bit hash. Therefore, use SHAKE256 with a 256-bit or 512-bit output length to obtain 256-bit preimage and second preimage resistance.
-
KangarooTwelve: much faster than SHA3 and SHAKE, has a safe security margin, and has no variants, but it’s rarely accessible and used, so you may as well just use SHAKE if you want something based on Keccak.
-
BLAKE2s: in most cases, you’ll want to use BLAKE2b, which is faster on 64-bit platforms, does more rounds, and can produce larger digests. Only use BLAKE2s if you’re hashing on 8- to 32-bit platforms since that’s what it’s designed for.
Notes:
-
These hash functions are not suitable for password hashing: these algorithms are fast, whereas password hashing needs to be slow to prevent bruteforce attacks. Furthermore, password hashing requires using a random salt for each password to derive unique hashes when given the same input and to protect against attacks using precomputed hashes.
-
These unkeyed hash functions are not suitable for authentication: you need to use MACs (please see the Message Authentication Codes section), such as keyed BLAKE2b-512 and HMAC-SHA512, for authentication because they provide the appropriate security guarantees.
-
SHA2 (except for SHA512/256 – SHA224 and SHA384 don’t provide the same level of protection), MD5, SHA1, Whirlpool, RIPEMD-160, and MD4 are susceptible to length extension attacks: an attacker can use
Hash(message1)
and the length ofmessage1
to calculateHash(message1 || message2)
for an attacker-controlledmessage2
, without knowingmessage1
. Therefore, concatenating things (e.g.Hash(secret || message)
) with these algorithms is a bad idea. Instead, BLAKE2b, SHA512/256, HMAC-SHA2, SHA3, HMAC-SHA3, or BLAKE3 should be used because none of these are susceptible to length extension attacks. Also, please read point 5 of the Message Authentication Codes Notes section because concatenating variable length parameters incorrectly can lead to another type of attack. -
Hash functions do not increase entropy: if you hash a single ASCII character, then that means there are still only 128 possible values. Therefore, prehashing passwords before using a password-based KDF doesn’t improve the entropy of the password. This is also why inputs to hash functions need to be high in entropy in some contexts (e.g. using the hash of a keyfile as an encryption key).
Password Hashing/Password-Based Key Derivation
Use (in order):
-
Argon2id (64+ MiB of RAM, 3+ iterations, and 1+ parallelism): winner of the Password Hashing Competition in 2015, widely used and recommended now, and very easy to use in libraries like libsodium. Use as high of a memory size as possible and then as many iterations as possible to reach a suitable delay for your use case (e.g. a delay of 500 milliseconds for server authentication, 1 second for file encryption, 3-5 seconds for disk encryption, etc). The parallelism can’t be adjusted in libraries like libsodium and Monocypher, but higher values based on your CPU core count (e.g. a parallelism of 4) should be used when possible on machines that aren’t servers.
-
scrypt (N=32768, r=8, p=1 and higher): the parameters are more confusing and less scalable than Argon2, and it’s susceptible to cache-timing attacks. However, it’s still a strong algorithm when configured correctly.
-
bcrypt (12+ work factor): note that this is not a KDF because the output length cannot be adjusted. However, it’s stronger than Argon2 and scrypt at shorter runtimes (e.g. a 100ms delay for password hashing on a server) since it’s minimally cache-hard. For longer runtimes (e.g. 1 second), Argon2 and scrypt are better choices because then memory-hardness becomes more important, whereas bcrypt uses a small, fixed amount of memory. Most importantly, it blows PBKDF2 out of the water in terms of resisting GPU/ASIC attacks. Unfortunately, it has a stupid password length limit of 72 characters, meaning people often prehash the password using something like SHA-2 to support longer passwords. However, this can lead to password shucking when using an unsalted/unpeppered hash function (e.g. MD5, SHA-1, SHA-256) and null bytes in the hash, allowing an attacker to find collisions that speed up attacks. Therefore, you should use hmac-bcrypt, which addresses these issues. For example, it Base64 encodes the prehash to avoid the null bytes problem.
-
PBKDF2-SHA-512 (200,000+ iterations): only use this when none of the better algorithms are available or due to compatibility restraints because it can be efficiently bruteforced using GPUs and ASICs when not using a high iteration count. Note that it’s generally recommended not to ask for more than the output length of the underlying hash function because this can lead to attacks. Instead, if that’s required, use PBKDF2 first to get the output length of the underlying hash function (64 bytes with PBKDF2-SHA-512) before calling a non-password-based KDF, like HKDF-Expand, with the PBKDF2 output as the input keying material (IKM) to derive more output.
Avoid (not in order because they’re all bad):
-
Storing passwords in plaintext: this is a recipe for disaster. If your password database is ever compromised, all your users are screwed, and your reputation in terms of security will go down the drain as well.
-
Using a password as a key (e.g.
key = Encoding.UTF8.GetBytes(password)
): firstly, passwords are low in entropy, whereas cryptographic keys need to be high in entropy. Secondly, not using a password-based KDF with a random salt means attackers can quickly bruteforce passwords and users using the same password will end up using the same key. -
Using a regular/fast hash function (e.g. MD5, SHA-1, SHA-2, etc): these are not suitable for password hashing because they’re not slow, which allows for fast bruteforce attacks. Password hashing also requires using a salt to protect against attacks using precomputed hashes and to prevent the same password always having the same hash. However, adding a salt to certain regular hash functions, such as SHA-2, can lead to length extension attacks, as discussed in point 3 of the Hashing Notes section.
-
Encrypting passwords: encryption is reversible, whereas hashing is not. If an attacker compromises a password database and obtains a password hash, then they don’t know the password without computing the hash. By contrast, if an attacker compromises a password database and the relevant encryption key(s), then they can easily obtain the plaintext passwords. Encryption would also reveal the password length unless you padded the input.
-
PBKDF1: never use this as it was superseded by PBKDF2 and can only derive keys up to 160 bits, which is basically not suitable for anything. Some implementations, such as PasswordDeriveBytes() in C#, are also completely broken.
-
SHAcrypt: it’s weaker than the recommended algorithms, nobody uses this, and I’ve never even seen it in a cryptographic library.
-
PBKDF2-MD5, PBKDF2-SHA-1, PBKDF2-SHA-256, and PBKDF2-SHA-384: use SHA-512 if you must use PBKDF2. MD5 and SHA-1 are old hash functions that should not be used anymore. Then PBKDF2-SHA-256 and PBKDF2-SHA-384 require significantly more iterations than PBKDF2-SHA-512 to be secure and have a smaller block size, meaning long passwords may get prehashed.
-
Argon2i with less than 3 iterations: unlike Argon2id and Argon2d, Argon2i has been attacked, with 3+ iterations being required for the attack to not be efficient and 11+ iterations being required for the attack to completely fail. Argon2i is also weaker than both Argon2id and Argon2d when it comes to resistance against GPU/ASIC cracking. Therefore, as per the RFC, Argon2id should be used if you do not know the difference between the types or you consider side-channel attacks to be a viable threat but want better GPU/ASIC resistance because Argon2id offers the benefits of both Argon2i (side-channel resistance, albeit to a lesser extent) and Argon2d (GPU/ASIC resistance).
-
Chaining password hashing functions (e.g.
scrypt(PBKDF2(password))
): this just reduces the strength of the stronger algorithm since it means having worse parameters to get the same total delay. -
Balloon hashing: arguably better than Argon2 since it’s similar in strength whilst having a more impressive design (e.g. no separate variants, resistance to cache attacks, easy to implement with standard cryptographic hash functions, and performant). Unfortunately, it has seen virtually no adoption. There seems to be no information on recommended parameters, the reference implementation is no longer maintained, there are no official test vectors, there’s no RFC draft, and only a handful of people have implemented the algorithm, with it not being in any popular libraries. Therefore, just use Argon2, which has now been standardised and widely adopted.
Notes:
-
Never hard-code passwords into source code: these can be easily retrieved.
-
Always use a random 128-bit or 256-bit salt: salts ensure that each password hash is different, which prevents an attacker from identifying two identical passwords without cracking the hashes. Moreover, salting defends against attacks that rely on precomputed hashes. The typical salt size is 128 bits, but 256-bit is also fine for further reassurance that the salt won’t repeat. Anything above that is excessive, and short salts can lead to salt reuse and allow for precomputed attacks, which defeats the point of salting.
-
Always use the highest parameters/delay you can afford: ideally, use a delay of 250+ milliseconds. In many cases, that’s too small. For instance, PBKDF2 requires a high number of iterations because it’s not resistant to GPU/ASIC attacks, and if you’re performing a non-interactive operation (e.g. disk encryption), then you can afford longer delays, like 3-5 seconds.
-
Avoid string password variables: strings are immutable (unchangeable) in many programming languages (e.g. C#, Java, JavaScript, Go, etc), meaning they can’t be zeroed out from memory. Instead, use a char array if possible and convert that into a byte array for password hashing/password-based key derivation. Then erase both arrays from memory after you’ve finished using them. Note that this is also difficult in many programming languages, as explained in point 7 of the Symmetric Encryption Notes section, but attempting to erase sensitive data from memory is better than doing nothing.
-
Compare passwords in constant time: if you ever need to compare passwords (e.g. for password re-entry in a console application), then you should use a constant time comparison function to prevent timing attacks. Sometimes these functions require both arrays to be equal in length to work correctly, in which case you can compare two MACs of the passwords calculated using the same random key; just erase the key from memory afterwards.
-
Use a 256-bit and above output length: for password storage, a 128-bit hash is normally fine, but a 256-bit output provides a better security level for high entropy passwords. For key derivation, you should derive at least a 256-bit output and perhaps more, depending on whether you need to derive multiple keys (e.g. a 256-bit encryption key and a 512-bit MAC key).
-
Always store the parameters (e.g. memory size, iterations, and parallelism for Argon2) with the password hash: these values don’t need to be secret and are required to derive the correct hash. When storing passwords in a database, you should store these values for each user to verify the hashes and transition to stronger parameters over time as hardware improves. In some cryptographic libraries, this is done for you. By contrast, in a key derivation scenario, you can get away with using fixed parameters based on a version number stored as a header (e.g. file format v3 = 256 MiB of RAM and 12 iterations). Then if you want to change the parameters, you just increment the version number.
-
Perform client-side password prehashing for server relief or to hide the plaintext password from the server: when creating an account, the server can send a random salt to the client that’s used to perform password hashing on the client’s device. The server then performs server-side password hashing on the transmitted password hash using the same salt. Then the salt and final password hash are stored in the password database. When logging in, the server sends the stored salt to the client, the client performs client-side password hashing, the client transmits the password hash to the server, the server performs server-side password hashing using the stored salt, and then the server compares the result with the password hash stored in the database. In the event of a non-existent user, the salt that’s sent should always be the same for a given username, which involves using a MAC (e.g. keyed BLAKE2b-512), with the username as the message.
-
Don’t use padding to hide the length of a password when sending it to a server: instead, perform client-side password hashing if possible (please see point 8 above). If that’s not possible, then you should hash the password using a regular hash function, with the largest possible output length (e.g. BLAKE2b-512), on the client’s device, transmit the hash to the server, and perform server-side password hashing, using the transmitted hash as the password. Both techniques ensure that the amount of data transmitted is constant and prevent the server effortlessly obtaining a copy of the password, but client-side password prehashing should be preferred as it allows for more secure password hashing parameters and provides additional security compared to if the server leaks/stores the client-side regular/fast hash of the password.
-
Use rate limiting to prevent denial of service (DOS) and bruteforce attacks: this involves blacklisting certain IP addresses and usernames from trying to log in temporarily to prevent the server being overwhelmed and to prevent attackers from bruteforcing passwords.
-
If a user can supply very long passwords, then this can lead to denial of service attacks: this happened to Django in 2013. To fix this, either enforce a password length limit (e.g. 128 characters is the max) or prehash passwords using a regular/fast hashing algorithm, with the highest possible output length (e.g. BLAKE2b-512), before performing password hashing.
-
Hash-then-Encrypt for additional security when storing passwords: you can use a password hashing algorithm on the password before encrypting the salt and password hash using an AEAD or Encrypt-then-MAC, with a secret key stored separately from the password database. This forces an attacker to decrypt the password hashes before trying to crack them. Furthermore, it means that if your secret key is ever compromised but the password hashes are not, then you can decrypt all the stored password hashes and re-encrypt them using a new key, which is easier than resetting every user’s password in the event of a pepper being compromised.
-
Use a pepper for additional security when deriving keys: a pepper is essentially a secret key that’s mixed with the password using a MAC (e.g.
HMAC-SHA512(message: password, key: pepper)
) before password hashing. In the case of password storage, using Hash-then-Encrypt makes more sense for the reason I explained above. By contrast, for key derivation, using a pepper is a great idea if possible because it means an additional secret is required, making a bruteforce more difficult. For instance, a keyfile in file/disk encryption software acts as a pepper, which improves the security of the key derivation assuming that the keyfile is stored correctly (e.g. on an encrypted memory stick away from the encrypted file/disk).
(Non-Password-Based) Key Derivation Functions
Use (in order):
-
Salted BLAKE2b: restricted to a 128-bit
salt
and 128-bit (16 character)personalisation
parameter for domain separation, which is annoying. However, you can feed more context information into themessage
parameter. Besides the weird context information size limit, this is easier to use than HKDF because there’s only one function rather than three, which can be confusing. Furthermore, please see the Hashing section for why BLAKE2b should be preferred over other hash functions. If there’s no KDF variant of BLAKE2b available in your library, then you can construct a BLAKE2b KDF usingBLAKE2b(message: salt || info || saltLength || infoLength, key: inputKeyingMaterial)
, with thesaltLength
andinfoLength
parameters being encoded as specified in point 5 of the Message Authentication Codes Notes section. Like HKDF, this custom approach allows for salt and info parameters of practically any length. -
HKDF-SHA512 or HKDF-SHA3-512: the most popular KDF with support for a larger salt and lots of context information. However, people get confused about the difference between the
Expand
andExtract
functions, thesalt
parameter ironically shouldn’t be used to pass in the salt (please see point 5 of the Notes below), it doesn’t require a salt despite it being recommended and beneficial for security, and it’s slower than salted BLAKE2b. Please see the Hashing and Message Authentication Codes sections for a comparison between SHA2/SHA3 and HMAC-SHA2/HMAC-SHA3. -
BLAKE3: as mentioned before, BLAKE3 has a lower security margin, but it also doesn’t have a salt parameter. With that said, very good guidance is given on how to produce globally unique and application specific context strings in the official GitHub repo. If you’d like to use a salt, then you can construct a custom KDF implementation as explained in point 1 above.
Avoid (not in order because they’re all bad):
-
Regular (salted or unsalted) hash functions: whilst this can be fine for deriving an encryption key from a Diffie-Hellman shared secret for example, it’s typically not recommended. Just use an actual KDF when possible as there’s less that can go wrong (e.g. there’s no risk of length extension attacks).
-
Password-based KDFs (e.g. PBKDF2): if you’re not using a password, then you shouldn’t be using a password-based KDF. Password-based KDFs are designed to be slow to prevent bruteforce attacks, whereas non-password-based KDFs are fast because they’re designed for high-entropy keys. Even with a small delay (e.g. 1 iteration of PBKDF2), this is likely slower and makes the code more confusing because an inappropriate function is being used.
-
HChaCha20 and HSalsa20: these are not general-purpose cryptographic hash functions, can only take a 256-bit key as input and output a 256-bit key, and are very rarely used, except in the case of implementing XChaCha20 and XSalsa20. If you want something based on ChaCha20, then use BLAKE2b or BLAKE3.
Notes:
-
These KDFs are not suitable for hashing passwords: they should be used for deriving multiple subkeys from a high-entropy master key or converting a shared secret concatenated with the public keys used to calculate the shared secret into a cryptographically strong secret key.
-
Using the same parameters besides changing the output length can result in related outputs (e.g. for HKDF and BLAKE3): this is exactly why you shouldn’t reuse the same parameters for different keys.
-
Use different contexts for different keys: a good format is
[application] [date and time] [purpose]
because this means the context information is application-specific and unique, which provides domain separation. -
Salted BLAKE2b can use a counter salt: if you’re deriving multiple subkeys from a master key, then you can use a counter salt starting at 0 (16 bytes of 0s) that gets incremented for each subkey. However, if you’re deriving a single key, then you may want to use a random salt.
-
Counterintuitively, the
info
parameter should be used to provide the salt for HKDF: thesalt
parameter should be left null to get the standard security definition for HKDF. Theinfo
parameter should contain the unique context information for that subkey concatenated with a randomly generated 128-bit or 256-bit salt that’s used for all subkeys. If these parameters are not fixed in length, then follow the guidance in point 5 of the Message Authentication Codes Notes section. Using a secret salt, which is a bit like a pepper, further improves the security guarantees.
Key Exchange/Hybrid Encryption
Use (in order):
-
Curve25519/X25519: popular, fast, easy to implement, fixes some issues with NIST curves, not designed by NIST, and offers ~128-bit security.
-
Curve448/X448: less popular and slower than X25519 but provides a 224-bit security level and is also not made by NIST. Generally, there’s not much reason to use this as a 128-bit security level is deemed enough for key exchange and quantum computers will break both X25519 and X448.
-
Pre-shared symmetric keys: this approach allows for post-quantum security and can be combined alongside an asymmetric key exchange. However, using pre-shared keys can be difficult since the key must be kept secret, whereas public keys are meant to be public and can therefore be easily shared.
-
X25519/X448 plus a post-quantum KEM: considering some post-quantum algorithms have been found to be considerably easier to attack than originally thought, it would be reckless to recommend switching to a post-quantum KEM alone when these algorithms need further analysis. Therefore, if you can’t use a pre-shared key but want to aim for post-quantum security, then you can concatenate the classical and post-quantum key exchange outputs and pass them through a secure KDF.
Avoid (not in order because they’re all bad):
-
Plain RSA, RSA PKCS#1 v1.5, RSA-KEM, and RSA-OAEP: plain/textbook RSA is insecure for several reasons, RSA PKCS#1 v1.5 is also vulnerable to some attacks, and RSA-KEM and RSA-OAEP, whilst both secure when implemented correctly, are still worse than using hybrid encryption because asymmetric encryption is slower, designed for small messages, doesn’t provide sender authentication without signatures, and requires larger keys. RSA-KEM is also never used and very rarely available in cryptographic libraries.
-
ElGamal: old, very rarely used, can only be used on small messages, produces a ciphertext that’s larger than the plaintext, the design is malleable, it’s slower than hybrid encryption, and it doesn’t provide sender authentication without signatures.
-
Unknown/unavailable curves (e.g. SIEC and Curve41417): some, such as SIEC, are completely unknown and have not received sufficient cryptanalysis, so they should be avoided at all costs. Then many curves are rarely used/available compared to Curve25519/X25519, P-256, P-384, and P-512. Please see the SafeCurves tables for a security comparison of most curves.
-
NIST curves (e.g. P-256, P-384, and P-512): although P-256 is probably the most popular curve, the seeds for these curves haven’t been explained, which is not a good look considering that Dual_EC_DRBG was a NIST standard despite containing an NSA backdoor. Furthermore, these curves require point validation, are harder to write implementations for, meaning libraries are more likely to contain vulnerabilities, and are slower than Curve25519/X25519, which has become increasingly popular over recent years (e.g. it’s used in TLS 1.3). These should only be used for interoperability reasons.
-
SRP, J-PAKE, and other PAKE protocols: note that these are only for password-based authenticated key exchange. SRP has an odd design, no meaningful security proof, cannot be instantiated on elliptic curves so is less efficient, is incompatible with TLS 1.3, and there have been many versions with vulnerabilities. Some PAKEs can allow for pre-computation attacks. Furthermore, very few cryptographic libraries include PAKEs, which makes good ones, like OPAQUE, difficult to recommend until they receive more adoption. Some people have argued PAKEs will not see widespread adoption, and I wouldn’t be surprised if that turns out to be the case.
-
Post-quantum algorithms: these are still being researched, aren’t implemented in mainstream libraries, are much slower than existing algorithms, and typically have very large key sizes. However, it will eventually make sense to switch to one in the future. For now, if post-quantum security is a goal, then use a pre-shared symmetric key if possible.
Notes:
-
Public keys should be shared, and private keys must be kept secret: never share private keys. Please see point 9 below for details about secure storage of private keys.
-
Never hard-code private keys into source code: these can be easily retrieved.
-
Use one of the recommended (non-password-based) KDFs on the shared secret with the public keys used to calculate the shared secret as part of the context information (e.g.
BLAKE2b-256(context: constant || publicKey1 || publicKey2, inputKeyingMaterial: sharedSecret)
): shared secrets are not suitable for use as secret keys directly because they’re not uniformly random. Moreover, you should include the public keys in the key derivation because multiple public keys can result in the same shared secret. By including the public keys, you improve the entropy of the derived key and ensure sender authentication. The libsodium key exchange API includes the public keys for you, but many libraries, like Monocypher, do not. Also, remember to derive unique keys each time by using the salt and context parameters, as explained in the (Non-Password-Based) Key Derivation Functions section. -
For hybrid encryption, use one of the recommended key exchange algorithms above with one of the recommended symmetric encryption algorithms: for example, use X25519 with (X)ChaCha20-Poly1305.
-
When using counter nonces for encryption, use different keys for different directions in a client-server scenario: after computing the shared secret, you can use a non-password-based KDF to derive two 256-bit keys as follows:
HKDF-SHA512(inputKeyingMaterial: sharedSecret, outputLength: 64, salt: null, info: clientPublicKey || serverPublicKey)
, splitting the output in two. One key should be used by the client for sending data to the server, and the other should be used by the server for sending data to the client. Both keys need to be calculated by the client and server. This approach allows counter nonces to be used safely for encryption without having to wait for an acknowledgement after every message. -
X25519 and X448 public keys are distinguishable from random data: if you need to obfuscate public keys so they’re indistinguishable from random, then you need to use Elligator2 with ‘dirty’ keys. You cannot use vanilla/standard keys. The easiest way to do this involves using X25519 and Monocypher since libsodium doesn’t and probably never will support Elligator2 fully. Note that other metadata (e.g. the number of bytes in a packet) can reveal the use of cryptography too, so you should pad such information using randomised padding or a deterministic scheme, like PADMÉ.
-
Use an authenticated key exchange in most non-interactive/offline protocols: the Noise protocol framework K and X one-way handshake patterns, as explained here, are perfect for non-interactive/offline protocols. These achieve sender and recipient authentication whilst preventing a compromise of the sender’s private key leading to an attacker being able to decrypt the ciphertext.
-
Opt for forward secrecy when possible in interactive/online protocols: this prevents a compromise of a long-term private key leading to a compromise of a session key, which is the strongest security guarantee you can achieve. This can be implemented using the Noise KK or IK interactive handshakes.
-
Store private keys encrypted: when storing a private key in a file, you should always encrypt it with a strong password for protection at rest. Things become more complicated for interactive/online scenarios, with physical or virtual hardware security modules (HSMs) and key vaults, such as AWS Key Management Service (KMS), sometimes being used. These types of solutions are generally regarded as more secure than storing keys in encrypted configuration files and allow for easy key rotation but using a KMS requires trusting a third party.
-
Key pairs should be rotated: if a private key has or may have been compromised, then a new key pair should be generated. Similarly, you should consider rotating your keys after a set period of time (a cryptoperiod) has elapsed.
Digital Signatures
Use (in order):
-
Ed25519: very popular, accessible, fast, uses small keys, produces small signatures, deterministic, and offers ~128-bit security.
-
Ed448: less popular and slower than Ed25519 but uses SHAKE256 (a SHA3 variant) instead of SHA512 for hashing and edwards448 instead of edwards25519 for the curve, meaning a 224-bit security level.
Avoid (not in order because they’re all bad):
-
Plain RSA, RSA-PKCS#1 v1.5, and RSA-PSS: plain RSA is insecure, RSA-PKCS#1 v1.5 has no security proof and is no longer recommended in the RFC, and RSA-PSS is slow for signing and generating keys, produces larger signatures, and requires larger keys than ECC based signing algorithms. Moreover, RSA has implementation traps.
-
ElGamal: old, even slower than RSA, not included in cryptographic libraries, basically not used in any software, not standardised, produces large signatures, and if the message is used directly rather than hashed, as specified in the original paper, then that allows for existential forgery.
-
DSA: very old, becoming less and less supported, typically used with an insecure key size, slower than Ed25519, requires larger keys than ECC, and it’s not deterministic, which has led to serious vulnerabilities (please see below).
-
ECDSA: slower than Ed25519 and not deterministic, which has led to serious vulnerabilities that affected Sony’s PS3 and Bitcoin, allowing attackers to recover private keys. This issue can be prevented by properly generating a random nonce, which requires having a good CSPRNG, or by deriving the nonce deterministically using something like HMAC. However, there’s been a shift to Ed25519 because it prevents this issue from happening as well as being better in other respects. Furthermore, there’s also the concern mentioned in the Key Exchange/Hybrid Encryption Avoid section that the NIST curves use unexplained seeds, which is not a good look considering that Dual_EC_DRBG was a NIST standard despite containing an NSA backdoor.
-
Post-quantum algorithms: these are still being researched, aren’t implemented in mainstream libraries, are much slower, and typically have very large key sizes. However, it will eventually make sense to switch to one in the future.
Notes:
-
Please read points 1, 2, 9, and 10 of the Key Exchange/Hybrid Encryption Notes section because all these points about key pairs/private keys apply for signature algorithms as well.
-
Use authenticated hybrid encryption (an authenticated key exchange with authenticated encryption) instead of encryption with signatures: this is easier to get right and more efficient.
-
Use Sign-then-Encrypt if you must use signatures with encryption to provide sender authentication: Encrypt-then-Sign can allow an attacker to strip off the original signature and replace it with their own. For symmetric encryption, Sign-then-Encrypt-then-MAC, which involves signing the message, appending the signature to the message, and using either Encrypt-then-MAC or an AEAD, prevents this problem. Similarly, if you’re forced to use asymmetric encryption, then you can still use Sign-then-Encrypt but should include the recipient’s name or the sender and recipient’s names in the message because the recipient needs proof that the same person signed and encrypted the message. Once the signature and encryption layers are bound together, an attacker can’t remove and replace the outer layer because the reference in the inner layer will reveal the tampering. Alternatively, you can Encrypt-then-Sign-then-Encrypt or Sign-then-Encrypt-then-Sign, which are both slower.
-
Don’t use the same key pair for signatures (e.g. Ed25519) and key exchange (e.g. X25519): it’s recommended to never use the same key for more than one thing in cryptography. The security of using the same key pair for these two algorithms has not been (sufficiently) studied, signing key pairs and encryption key pairs often have different life cycles, and using different key pairs limits the damage done if one key pair is compromised. Since the keys are so small, using different key pairs produces barely any overhead as well. The only time you should really convert an Ed25519 key pair to an X25519 key pair is if you’re heavily resource constrained or when you’re forced to use Ed25519 keys (e.g. SSH public keys off GitHub could be used for hybrid encryption).
-
Prehash large messages: signing a message normally requires loading the entire message into memory, but this can be problematic for very large (e.g. 1+ GiB) messages. To solve this problem, you can use Ed25519ph or Ed448ph (which probably isn’t available) to perform the prehashing for you with some additional domain separation, or you can prehash the message yourself using a strong, modern hash function, like BLAKE2b or SHA3, with a 512-bit output length and sign the hash instead of the message. However, note that not prehashing means that Ed25519 is resistant to collisions in the hash function. Therefore, when possible, ordinary signing should arguably be preferred for additional protection, although this isn’t realistically a problem if you use a secure hash function.
-
Be aware of fault attacks against deterministic signatures: techniques like causing voltage glitches on a chip (e.g. on an Arduino) can be used to recover either the entire secret key or part of the secret key, depending on the signature algorithm, and create valid signatures with algorithms like Ed25519, Ed448, and deterministic ECDSA. However, this is primarily a concern on embedded devices and requires physical or remote access to a device. Four countermeasures include signing the same data twice and comparing the outputs, which is obviously slower than signing once, verifying the signature after signing, which is slower than signing twice for small messages but faster for large messages, calculating a checksum over input values before and after signature generation, or using a cryptographic library that implements the algorithm with some random data in the calculation of the nonce, which is the technique used by Signal. However, these countermeasures are not guaranteed to be effective, and there can be other side-channel attacks as well.
Asymmetric Key Size
Use (in order):
-
256-bit keys: the key size for X25519, which provides a ~128-bit security level. Why am I recommending this when I recommend 256-bit keys (a 256-bit security level) for symmetric encryption? Because 128-bit security means something different in the case of these asymmetric algorithms. Furthermore, X25519 is faster, more common, and more accessible than X448. Finally, when quantum computers do come along, ECC and RSA will be broken regardless of the key size anyway, so many people feel less of a need to use a higher security level curve.
-
456-bit keys: the key size for X448, which provides a 224-bit security level.
-
3072-bit or 4096-bit keys: if you’re forced to use RSA, then the minimum key size should be 3072-bit, which is the key size currently used by the NSA and recommended by ECRYPT for near term protection. The maximum should be 4096-bit because the performance is really bad after that. However, seriously don’t use RSA!
Avoid (not in order because they’re all bad):
-
1024-bit keys: these are no longer secure.
-
2048-bit keys: these only provide a 112-bit security level, which is below the standard 128-bit security level. Therefore, whilst commonly used and still safe as a minimum RSA key size, it makes sense to use 3072-bit keys instead.
-
8192-bit keys: these are slow to generate and excessive to store.
-
Post-quantum algorithm key sizes: these algorithms are still being researched, and the key sizes are very large compared to those for ECDH.
Concluding Remarks
I believe there are three main areas of improvement when it comes to individuals with experience in cryptography helping developers:
-
Cryptographic libraries should be better: most don’t make it easy to use cryptography safely (e.g. they support insecure algorithms, require nonces, etc) and have horrible documentation. This shouldn’t be the case, and there really should be greater uproar about this. Things need to be secure by default (e.g. insecure algorithms should never be implemented or get removed), and the documentation needs to be readable, as in concise, helpful to people of all skill levels, presented on a modern looking website rather than using basic HTML or a bunch of files on GitHub, and easy to navigate (e.g. supporting search functionality like GitBook does).
-
People should stop saying ‘don’t roll your own crypto’: repeating this phrase doesn’t help anyone. Instead, educate developers about how to do things properly, whether that be by answering questions on Cryptography Stack Exchange in an understandable manner, writing a blog, or replying to emails from people asking for help. It’s not a crime to implement Encrypt-then-MAC, and even when someone writes a custom cipher, you should explain why that’s not a good idea (e.g. ‘professional cryptographers still design insecure algorithms’).
-
There should be more peer review: it’s often difficult to receive peer review, impossible to fund a bug bounty program with cash rewards, and extremely unlikely for projects to get funding for a code audit. Whilst developers who fail to do any reading related to cryptography obviously deserve criticism, even experienced professionals make mistakes. Simple peer review (e.g. using the search on GitHub for things like ‘HMAC’ and ‘ECB’) helps catch things that are easy to spot, and more thorough peer review helps catch things that even someone experienced might have missed. If something seems dodgy, then you should investigate if possible.