SFrame.js: end to end encryption for WebRTC

If you have followed the news lately, you would already now about insertable streams and how it enables implementing end to end encryption in WebRTC.

Also, you should be already aware about the existence of SFrame, the end to end encryption mechanism which is used in Google Duo and in within Cosmo’s commercial products.

Last week, in an effort to widen the adoption of end to end encryption in WebRTC products and services, a more recent and more formal version of SFrame has been uploaded as a standard track IETF draft. Also in order to retrieve feedback from the community and improve the draft in newer versions mailing list has been setup.

But there was a missing piece..

Running code

Before getting into the details, I would like to thanks both Emad Omara (main author of the SFrame draft and co-author of MLS) for his patience explaining crypto stuff and Lorenzo Miniero for the “free beta-testing” and collaboration when including the SFrame.js in Janus, who has also wrote a great article about the whole process that you must read now!

Differences from sframe current draft

  • IV contains the keyId and the frame counter to ensure uniqueness when using same encryption key for all participants.
  • keysIds are limited to 5 bytes long to avoid JavaScript signed/unsigned issues.
  • Option to skip the VP8 payload header and send it in clear.
  • Ed25519 is not used for sign/verify as it is not available in webcrypto (however there is an intent to prototype in blink and some skeleton code available in Chrome already), ECDSA with P-512 is used instead.

Bring your own KMS (BYOKMS)

Why? Mainly because two reasons, there is already an IETF effort going on to provide this feature the Message Layer Security (MLS) and it is quite common for organizations providing secure communications to already have some kind of KMS mechanism that could be leveraged by SFrame.

The only requirement is that each participant in the conference must have an associated numeric id (the senderId) and an associated symmetric encryption key.

If you still don’t have a KMS in place, while it is not recommended, you can still choose to go for a simple e2ee scheme with a common shared encryption key across all participants and skip the signature part of the SFrame.

How to use SFrame.js

Once you import the SFrame module into your project, you can create a Client which will be bounded to the specified senderId.

While the future proof way of supporting e2ee via insertable streams is by implementing support for the generic rtp packetization and the generic video descriptor rtp extension header, we have enabled the skipVp8PayloadHeader which will allow you to use VP8 codec without it, by sending the VP8 payload header in clear.

You can also use VP9 directly as the VP9 header description containing the required information for SFU layer selection is added after the e2ee encryption, so it is sent in clear. Supporting H264 without the generic paquetization or the generic video descriptor rtp extension header, while doable, is much more difficult and does not compensate the effort.

As said before, the keyIds are numeric as they need to be sent in each frame, so sending a string uuids would cause too much overhead (specially on audio). Note that they are variable length encoded, so starting from 0 and incrementing the counter on each participant would provide the best performance.

You would also need to set the 32 bytes encryption key (and optionally the private key for signing) before encrypting any data:

Once this step is done, you can encrypt your peerconnection senders:

Note that you will need an unique id to be passed to the encryption method, which can be either the transceiver.mid or the or any other one that you implement (as long as it is unique).

Internally the encrypt method will create the insertable streams for the sender and transfer them to the web worker so the encryption process is performed there.

You would do it similarly for decrypting the receivers:

When a frame with a new keyId is correctly decrypted on a RTCRtpReceiver insertable streams, you will get an event so you can associate the authenticated senderId being received on the receiver.

However, in order to be able to decrypt the frames received by worker in the insertable streams, you will need to add a new receiver with its associated keyId and setup their symmetric key for encryption and public key for verification:

Signature verification

SFrame.js will send signature information for each stream periodically and verify that the signature received for remote senders is valid according to its public key, but it is not clear which is the most appropriate way of signaling this back to the application.

An event on successfully verifying the signature feels appropriate, but not all frames may be signed and the frames with the signature may be dropped (either by the SFU or the network), so a binary state on the stream “authentication verified”/“not verified” doesn’t seem appropriate and maybe an stats based approach about frames verified vs frames received would be better.

Key rotation and ratcheting

Sending new fresh keys is an expensive operation, so the key management component might chose to send new keys only when other clients leave the call and use hash ratcheting for the join case, so no need to send a new
key to the clients who are already on the call.

SFrame and SFrame.js supports both, by either updating the encryption key for the sender or receiver, or by ratcheting the sender key:

Note that you don’t need to ratchet the receiver keys as SFrame.js will automatically try to ratchet them when a frame decryption fails.

Demo time!

What’s next?

  • Code review.
  • Improve SFrame draft with enhancements and feedback from the community.
  • Explore integration with different KMS, MLS being the preferred one.

Doing RTC media servers since 2003.