Heading image for post: Revisiting a Video Chat Application with Phoenix and WebRTC

Revisiting a Video Chat Application with Phoenix and WebRTC

Profile picture of Philip Capel

Revisiting a Video Chat Application with modern JavaScript and Elixir/Phoenix

In a previous post we talked about implementing a simple video chat with WebRTC and Elixir. This update will touch on some of the API changes that have happened since.

NOTE: I will also be using javascript's async/await, closures, and modules.

For starters, the newest version of elixir/phoenix will not be compatible due to changes in the dependencies. I'm writing this for elixir v1.9.2 and phoenix 1.4.10. So let's jump right in! If you've already followed the installation guide for phoneix, then we should be able to start with the following command:

mix phx.new phoenix_webrtc

Be sure to hit y to have the dependencies installed for the front-end

NOTE: In my code on github, you'll notice that I used phoenix_webrtc_revisited as the project name.

This will create our project directory and setup our webpack and front-end. This will make creating the simple version of our video chat much easier.

Since the project is pretty well bootstrapped, we can skip creating the controllers and handling the response with our index page. Instead, we can jump straight to getting our Channel set up for the traffic we expect.

This means un-commenting the channels setup, and changing the name of the topic that we expect for the handler:

# /phoenix_webrtc/lib/phoenix_webrtc_web/channels/user_socket.ex
# ...
# Channels
channel "video:*", PhoenixWebrtcWeb.VideoChannel

Now that we have that set up, we need to create the handler for the channel. Create a file in the same directory called video_channel.ex and put this in it:

# lib/phoenix_webrtc_web/channels/video_channel.ex
defmodule PhoenixWebrtcWeb.VideoChannel do
  use Phoenix.Channel

  def join("video:peer2peer", _message, socket) do
    {:ok, socket}
  end

  def handle_in("peer-message", %{"body" => body}, socket) do
    broadcast_from!(socket, "peer-message", %{body: body})
    {:noreply, socket}
  end
end

What we're doing here is setting up the sub-topic join with "video:peer2peer". We also want to handle sending the negotiation messages back out, so we need to create a handle_in function to deal with that.

Using the broadcast_from function will broadcast to all except the socket that sent the originating message. This is nice since we don't want to have to make that distinction in the client side.

That's largely all the back-end setup that is required to get a basic example working. We'll have quite a bit more to do on the front-end however, since the WebRTC APIs have changed more since the last post. Let's start with some plain HTML.

<!-- lib/phoenix_webrtc/templates/page/index.html/eex -->
<label for="local-stream">Local Video Stream</label>
<video id="local-stream" autoplay muted></video>
<label for="remote-stream">Remote Video Stream</label>
<video id="remote-stream" autoplay></video>

<button id="connect">Connect</button>
<button id="call">Call</button>
<button id="disconnect">Disconnect</button>

A pretty simple layout with two video tags to handle the streams. We have a connect button so that we can start the requests when we're ready (i.e. once we've got the console open in both tabs), as well as call and disconnect. Make sure you add the autoplay part so that the video starts as soon as you assign the stream to the element. Muting the local video will also help with feedback.

Moving on to the actual client side code, we'll update the socket.js file to appear as follows:

// assets/js/socket.js
import { Socket } from 'phoenix';

let socket = new Socket('/socket', { params: { token: window.userToken } });

socket.connect();

let channel = socket.channel('video:peer2peer', {});

channel
  .join()
  .receive('ok', resp => {
    console.log('Joined successfully', resp);
  })
  .receive('error', resp => {
    console.error('Unable to join', resp);
  });

export default channel;

There's a lot of comments generated by phx.new and we can go ahead an delete them. Additionally, we need to update the topic and subtopic of our channel connection. I personally like to use console.error to report out errors, since it makes it easier to filter in the dev-tools. We're also only going to only export the channel, since we're not going to be connecting to any other topics.

Now we can wire up our buttons and capture our initial application state:

// assets/js/app.js
import css from '../css/app.css';

import 'phoenix_html';

import channel from './socket';

const connectButton = document.getElementById('connect');
const callButton = document.getElementById('call');
const disconnectButton = document.getElementById('disconnect');

const remoteVideo = document.getElementById('remote-stream');
const localVideo = document.getElementById('remote-stream');
const remoteStream = new MediaStream();

setVideoStream(remoteVideo, remoteStream);

let peerConnection;

disconnect.disabled = true;
call.disabled = true;
connectButton.onclick = connect;
callButton.onclick = call;
disconnectButton.onclick = disconnect;

Two things to note here: 1. We're creating an empty MediaStream for the remote video element and setting it as the stream right off. We can do this because we only expect to have the one remote video source, and it's pretty easy to handle this way.

  1. We are declaring a module level variable to hold the peer connection. This is a convenience to allow us to easily access it from within the handlers. There are other approaches that might be better, but this quick and dirty approach allows us to write a simpler example.

As a bit of an aside, since you're going to see these invoked in examples, here's the definition of my logging helpers:

// assets/js/app.js
const reportError = where => error => {
  console.error(where, error)
}

function log() {
  console.log(...arguments)
}

I just find them helpful and terse enough to add to small projects easily. The reportError function is curried because that makes it easier to pass in as a handler method to catch. Reporting the error alone without context is sometimes confusing.

This is where we're going to diverge from the original. In the first post we had relied on callbacks in the API, but this is now deprecated in favor of promises. Given the nature of some of those promises, I think it makes for much more readable code to use async/await syntax. This will require a little configuration change, so we'll need to update our assets/.babelrc as such:

{
  "presets": [
    [
      "@babel/preset-env",
      {
        "useBuiltIns": "usage"
      }
    ]
  ]
}

This is just telling babel to supply the polyfills for our async/await language based on whether we use it. It may produce a warning about core.js, but this can be ignored for now.

Now that we can start writing in our app.js, we'll go ahead an make a pair of helpers to set/unset the video stream object:

function setVideoStream(videoElement, stream) {
  videoElement.srcObject = stream;
}

function unsetVideoStream(videoElement) {
  if (videoElement.srcObject) {
    videoElement.srcObject.getTracks().forEach(track => track.stop())
  }
  videoElement.removeAttribute('src');
  videoElement.removeAttribute('srcObject');
}

Setting the video stream to an element is easy enough, but unsetting it requires us to loop through the tracks and stop them prior to removing them. This just tells the device that the media is no longer required, instead of waiting for the object to be garbage collected.

We're going to call mediaDevices.getUserMedia to get a promise for the media stream object. This promise will resolve once the user has selected to allow the page access to their device. If they never click, or if they click to deny, then the promise simply never resolves. Since we can't really do any video chatting if they never allow it, we can safely hinge the program on an await of this promise. So we'll set up the local stream in the connect() function like so:

async function connect() {
  connectButton.disabled = true;
  disconnectButton.disabled = false;
  callButton.disabled = false;
  const localStream = await navigator.mediaDevices.getUserMedia({
    audio: true,
    video: true,
  });
  setVideoStream(localVideo, localStream);
}

Feel free to separate out the media constraints object passed into the getUserMedia method.

We're also setting the buttons' disabled state to reflect the fact that we are going to be connected. We can start fleshing out the disconnect button at this point:

function disconnect() {
  connectButton.disabled = false;
  disconnectButton.disabled = true;
  callButton.disabled = true;
  unsetVideoStream(localVideo);
  unsetVideoStream(remoteVideo);
  remoteStream = new MediaStream();
  setVideoStream(remoteVideo, remoteStream);
}

You should be able to start up the server and see the page rendering out your video streams. If you click the connect button you'll be prompted for permissions, and if you allow it, your camera will start streaming to the local video element!

But we don't really want to just stare ourselves down so we need to setup the actual RTC portion with our peer. This will require that we create an RTCPeerConnection object and set up the correct event handlers on it:

async function connect() {
  // ...
  peerConnection = createPeerConnection(localStream);
}
// ...
function createPeerConnection(stream) {
  let pc = new RTCPeerConnection({
    iceServers: [
      // Information about ICE servers - Use your own!
      {
        urls: 'stun:stun.stunprotocol.org',
      },
    ],
  });
  pc.ontrack = handleOnTrack;
  pc.onicecandidate = handleOnIceCandidate;
  stream.getTracks().forEach(track => pc.addTrack(track));
  return pc;
}

In order to keep the connect() function fairly clean, we split out the creation logic for the connection object. We'll need to flesh out those even handlers though. It's also worth noting that there are several more handlers than this, however the addstream handler from the previous post is now deprecated.

The createPeerConnection helper takes a MediaStream as an argument. This allows us to add the MediaTracks for the local stream to the peer connection. Essentially this will ensure that once we connect with a peer, they'll be receiving the stream from our machine to theirs.

We should also go ahead and close the connection and nullify the module level variable when we disconnect:

function disconnect() {
  connectButton.disabled = false;
  disconnectButton.disabled = true;
  callButton.disabled = true;
  unsetVideoStream(localVideo);
  unsetVideoStream(remoteVideo);
  remoteStream = new MediaStream();
  setVideoStream(remoteVideo, remoteStream);
  peerConnection.close();
  peerConnection = null;
}

NOTE: In an actual application, you might want to clear the event handlers on the connection prior to closing it to avoid errors while it's being shut down.

Once we have our RTCPeerConnection we can use it to create an offer for connection. This is essentially calling another peer, so we'll put this into the call function. The createOffer method will return a promise that resolves to an RTCSessionDescription. This description is what we need to send over the signal server. We can await this description in the call function as well, and then set the localDescription of our peer connection as follows:

async function call() {
  let offer = await peerConnection.createOffer();
  peerConnection.setLocalDescription(offer);
  channel.push('peer-message', {
    body: JSON.stringify({
      'type': 'video-offer',
      'content': offer
    }),
  });
}

Because we have several types of messages that require specific methods to be called in response, we add a type attribute to our message body. Essentially we'll have four types: 'video-offer', 'video-answer', 'ice-candidate', and 'disconnect'. The 'ice-candidate' event is referring to Interactive Connectivity Establishment (ICE).

The logic for sending a message will get pretty repetitive with four message types, so we can go ahead and refactor the connect function with a new pushPeerMessage helper:

// ...
function pushPeerMessage(type, content) {
  channel.push('peer-message', {
    body: JSON.stringify({
      type,
      content
    }),
  });
}
// ...
async function call() {
  // ... all the code we wrote before channel.push
  pushPeerMessage('video-offer', offer);
}

This will make it quite a bit easier to chat with the server without needing to type out the structure of the message, and will save us from typos in the message string. Not that that's ever been a problem for me...

Let's flesh out the event handlers that we haven't defined yet, handleOnTrack and handleIceCandidate. These two functions are critical for negotiating the peer connection through the signal sever. For now though, we can just inspect their arguments to get an idea of how the program flows:

function handleOnTrack(event) {
  log(event);
}

function handleIceCandidate(event) {
  log(event);
}

Go ahead and run the application to see what gets logged out when you connect.

You should see some number of icecandidate events. If you inspect those, then you'll see that they have a candidate member. This is what we want to communicate to the other peer for our ICE. Once they settle on a candidate they can establish the p2p connection and start sending packets!

So we'll just flesh out the handler to send that message to our peer:

function handleIceCandidate(event) {
  if (!!event.candidate) {
    pushPeerMessage('ice-candidate', event.candidate);
  }
}

We're guarding here against a null candidate, which is an indication that the candidate gathering process is done. There are other ways of handling this, but they would complicate the simplicity of this example.

With that, we can go ahead and start thinking about receiving the messages:

channel.on('peer-message', payload => {
  const message = JSON.parse(payload.body);
  switch (message.type) {
    case 'video-offer':
      log('offered: ', message.content);
      break;
    case 'video-answer':
      log('answered: ', message.content);
      break;
    case 'ice-candidate':
      log('candidate: ', message.content);
      break;
    case 'disconnect':
      disconnect();
      break;
    default:
      reportError('unhandled message type')(message.type);
  }
});

If you run the application and connect then call, you should see the messages hit the server, but nothing is output to the logs. This confirms that we aren't broadcasting the message back to ourselves.

If you open two tabs and click connect in both tabs, then call in tab 1, you should see the logs on tab 2. They should start with the initial 'video-offer', and then follow with the 'ice-candidate's.

We're not going to implement anything to allow the user to accept/decline an incoming call, so we'll just write a helper to push back and answer.

async function answerCall(offer) {
  let remoteDescription = new RTCSessionDescription(offer);
  peerConnection.setRemoteDescription(remoteDescription);
  let answer = await peerConnection.createAnswer();
  peerConnection
    .setLocalDescription(answer)
    .then(() =>
      pushPeerMessage('video-answer', peerConnection.localDescription)
    );
}

We can just call that from the 'video-offer' case with our message content:

channel.on('peer-message', payload => {
  const message = JSON.parse(payload.body);
  switch (message.type) {
    // ...
    case 'video-offer':
      log('offered: ', message.content);
      answerCall(message.content);
      break;
    // ...
  }
});

Now you should be able to connect between two tabs, and when you call from one you'll see the answer logged out (along with some new 'ice-candidates'). At this point though, you'll get an error about the ICE negotiation failing. This is because we aren't actually adding the ICE candidates to our peer connection yet. This is simple enough though:

// ...
case 'ice-candidate':
  log('candidate: ', message.content);
  let candidate = new RTCIceCandidate(message.content);
  peerConnection.addIceCandidate(candidate).catch(reportError);
  break;
// ...

The last thing to do is to handle receiving an answer from the remote client. This is where we finally see the fruits of our labor! We're going to need to set the remote description again here, so we can reduce some duplication again in the answerCall function by breaking out the folowing:

function receiveRemote(offer) {
  let remoteDescription = new RTCSessionDescription(offer);
  peerConnection.setRemoteDescription(remoteDescription);
}

async function answerCall(offer) {
  receiveRemote(offer);
  let answer = await peerConnection.createAnswer();
  peerConnection
    .setLocalDescription(answer)
    .then(() =>
      pushPeerMessage('video-answer', peerConnection.localDescription)
    );
}

Now we can recieve the video answer easily enough:

case 'video-answer':
  log('answered: ', message.content);
  receiveRemote(message.content);
  break;

Simply setting up the remote isn't enough though. While this will work, check it out, you'll quickly notice that the remote video isn't actually showing us anything. This is because the stream's tracks aren't ever correctly handled, we need to jump back into handleOnTrack:

function handleOnTrack(event) {
  remoteStream.addTrack(event.track);
}

Finally, we just need to make sure that our disconnect is being communicate to the peer.

function disconnect() {
  connectButton.disabled = false;
  disconnectButton.disabled = true;
  callButton.disabled = true;
  unsetVideoStream(localVideo);
  unsetVideoStream(remoteVideo);
  peerConnection.close();
  peerConnection = null;
  remoteStream = new MediaStream();
  setVideoStream(remoteVideo, remoteStream);
  pushPeerMessage('disconnect', {});
}

And with that, this example is done. You should now be able to deploy this and connect between peers.

NOTE: If your connection doesn't work, it's possible that the peer that you wish to connect to is behind a symmetric NAT or some other restrictive firewall. You'll need to establish a TURN server as a proxy to the peer to peer connection. This is outside the scope of this post though.

The source code for this exercise can be found on github.

It's worth noting that there are quite a few features that we didn't explore that would be pretty critical to getting a fully serviceable application running. However, this is a fun way to get your feet wet with WebRTC, and thanks to Phoenix it's super easy.

If you have any questions/comments/conecerns feel free to reach out to me on GitHub. Otherwise, happy hacking!