What is a PKI?

tbowan
(en français)
September 19th 2019

Spoiler: By dint of talking about it, it almost became an empty buzzword. So let’s demystify this concept to see what it brings and how to use it appropriately. It means Public Key Infrastructure and, to put it simply, it is when we secure our network communications with certificates whose authenticity is established by trusted third parties (CAs).

A few years ago, I found myself stupid when a recruiter asked me to define what a PKI was. Not being a fan of acronyms and having memories leaks when we talk about formal definition of proper names, I asked him to translate the acronym for me before exclaiming:

Hu, it is just that?

tbowan

Well, that didn’t really help me convince them to hire me, but I always thought it was stupid to miss out on opportunities because one forget the meaning of an acronym.

Digital trust

It all starts with a problem of trust in the networks. When we communicate or access data, we can legitimately ask ourselves these three questions:

Who is the author? Technically, we then speak about authentication when we make sure of it, and about spoofing when we get tricked.
Has the data been modified? In IT, we talk about integrity. Cryptographers speak more of authenticity because they add the identity of the author.
Have the data been disclosed? This time, computer scientists and cryptographers agree to talk about privacy.

In real life, face to face, things are simple. Evolution has given us very good skills in face recognition to identify our peers, as well as speech and hearing to transmit messages quite effectively. And if we want to be discreet, we can isolate ourselves from the group.

With writing, and crafting in general, it gets more complicated. With this ingenuity which characterizes it, humanity then invented the seals. The principle is always the same :

Incorporate a little something unique or very difficult to reproduce.

It can be a handwritten signature, a wax seal, a stamp or any other technique that is difficult to copy. For example, the presence of the seal makes it possible to determine the origin of the object and if it is used on an envelope or a container, that the latter has not been opened.

On the other hand, in our all digital world, a physical seal is obviously no longer usable (unless you use birds to transmit your communications). But don’t worry, humanity has developed adapted methods…

Cryptography

Cryptography was born out of the need to make messages unreadable. History has provided us with algorithms with varying degrees of success. Over time, we realized that we could do more than just make it unreadable: to detect changes and even to authenticate the sender.

This is exactly what we need and for this, three main categories of algorithms are used:

Symmetric algorithms, which use the same key to encrypt and decrypt messages. They are relatively fast but require sharing a key beforehand when you want to secure a communication;
Asymmetric algorithms, which use two keys that cancel each other (what one cipher, the other decrypts). They are slower but allow the public key to be shared with everyone (the private key is very difficult to find from the public key);
Hash functions, which calculate a small fingerprint from any data. A change in the initial data having visible repercussions on the footprint.

It is then possible to solve the three initial problems using these algorithms, symmetric, asymmetric and hashes.

Confidentiality: a symmetric algorithm will make the data unreadable. The key used can be encrypted by the recipient’s public key (he is then the only one able to decrypt it and therefore use it) or else be determined by a specific algorithm (i.e. Diffie Hellman) which then requires authentication of the correspondents.
Integrity: a hash function is used to calculate a fingerprint that will be attached to the message. Upon receipt, a new fingerprint will be calculated and will validate that the data has not been modified.
Authenticity & authentication: The message digest is then encrypted with the sender’s private key (this is called a signature and it is this which is attached to the message). The receiver can then validate that only the holder of the key was able to send the message since he is the only one who was able to encrypt the fingerprint of the message received.

Ultimately, all confidence in online communications boils down to the use of private keys and the dissemination of public keys. Hence the name of this type of so-called “public key” trust infrastructure.

Certificates

A cryptographic certificate (or even digital certificate) is none other than a specific computer document containing a public key, information concerning the identity of its owner and some specific information (validity dates, possible uses of the key, …).

Once provided with a certificate, it is possible to communicate and exchange documents with its owner with complete confidence. You will be able to check the authenticity of his messages. If for his part, he also wishes to authenticate you, you will have to provide him with your own certificate, we will then speak of mutual authentication.

But there remains the problem of the transmission of the certificate itself and the trust that can be placed in this document.

For small groups of correspondents, the certificate can be distributed manually. But when the number of certificates or participants grows, this is no longer an option and must then be distributed digitally.

This is why a certificate is always signed by a trusted third party. This third party will calculate the fingerprint of the certificate and sign it with its private key. Only he can perform this signature, if you trust this third party, you trust the certificates he signed. Conversely, if you want your correspondent to trust you, you must ask the third party to sign your certificate for you.

And in order to validate the signatuire of the certificates, the trusted third party has provided you with their. Another certificate, containing his identity, his public key and the control information (mentioning that he can sign certificates, among other things).

Certificates therefore form chains, each node being signed by the next node. And at the end of the chain, because it has to eventually stop, is a certificate that signs itself (we say it is self-signed). It is called certificate root or Certificate Authority since it is he who has authority on the whole chain.

Moreover, from his point of view, it is not a question of a chain but of a tree, of which he is the source. Because each certificate will in fact sign several others, thus forming generations of certificates.

It is thus the CA which one distributes and which one install in the software carrying out the security checks.

Building a PKI consists in creating a self-signed root certificate, then creating any intermediate certificates (signed by the root) and finally signing the certificates of the applications and people who need to prove their identities. Conversely, using a PKI consists of integrating the root certificate and any intermediaries into your software.

Concretely, TLS

Very quickly, after the birth of the World Wide Web, there was a need to secure online communications. The most telling example being the site of an online bank.

This security involves a layer of cryptography (e.g. TLS) inserted between the network protocol which conveys the data (encrypted) and the application layer which manages this data (as clear text).

It is this layer that ensures the authenticity and confidentiality of your exchanges. For that, it will use the same algorithms as before and therefore use digital certificates. That of the server so that you can authenticate it and, more rarely, yours when it requires mutual authentication.

To distinguish between versions with and without TLS, it was customary to add an s to the protocol when it is secure. HTTP, once protected by TLS, becomes HTTPS. Likewise, LDAP becomes LDAPs. And as always, there are exceptions, like the DNS which becomes DoT (DNS over TLS).

Authentication of users by certificates is one of the most secure methods but is generally shunned. On the one hand, because the server must be configured specifically (and some applications are not compatible). On the other hand, because users must generate certificates that the administrator must then sign. It’s not very complicated (once you know what to do) but it’s tedious.

Authentication of servers, on the other hand, requires less operations, you only have to simply install the CA in your browser. It’s not fundamentally complicated, but you still have to know how to do it.

And after ?

A PKI is therefore nothing more than a method for securing network communications based on cryptographic certificates whose authenticity is assured by trusted third parties, CAs that must be deployed in applications.

To make life easier for everyone, some CAs are also pre-installed by software vendors. This is the case with web browsers; if your server’s certificate is signed by one of these CAs, it will be recognized automagically by the browsers of your visitors who will have nothing to do to enjoy your website “safely”.

This advantage has created a very lucrative market for signing certificates. Companies with CAs integrated into browsers sharing the monopoly and charging for the privilege of being signed and therefore recognized by your visitors (from 179€/year at GlobalSign, $238/year at digicert).

Fortunately for all website administrators, things have changed since 2015, with the launch of Let’s Encrypt, a certification authority integrated into browsers that signs certificates for free and automatically. (including ours).