Stitching video streams together using the Canvas API

The Problem

Recently I came across a problem where I had to record/stream a user's webcam and their screen share simultaneously to a back-end. This seemed like a fairly straight forward problem to solve. Just get the the streams using the Media Devices Interface's getUserMedia and getDisplayMedia APIs. Then, depending on the back-end setup, we could maybe pass these streams through MediaStream Recording API and get (let's say, 1 second) chunks of binary video data and stream these chunks using WebSockets to the back-end.

This setup works very well if you have the back-end resources to then be able to merge these separate chunks of binary video into two single files or merge these two streams into a single side by side view video file.

And, this got me wondering, is there a way we could combine two streams(or videos, for the matter) into a single side by side video, all in the browser itself?

The Canvas API to the rescue

Well, it turns out that the MediaStream Recording API can literally take a <canvas> element as an input(it also takes <video> and <audio> elements, btw). And, guess what? We can "render" a video in the canvas using the drawImage method of the 2D rendering context.

Voila! Bingo! We have a solution in our hands.

Let's code this up now.

Demo Time

Here, I am using plain JavaScript for the demo.

First, we need two global variables to store our streams for the webcam and the screen.

let camVideoStream = null;
let screenVideoStream = null;

And we'll also need a const to store the options that we'll later pass to the MediaRecorder constructor.

const options = {
  mimeType: 'video/webm; codecs=vp8',
};

For the mimeType, I found WebM VP8 codec to be the best supported across Chrome and Firefox. I used this stackoverflow answer to figure this out.

And we'll also need an array to store our video chunks into.

const chunks = [];

Now, we can have two buttons in the document, each to get permissions and start the streams from webcam and screen.

document.querySelector('#startScreen').addEventListener('click', async (e) => {
  const stream = await navigator.mediaDevices.getDisplayMedia({
    video: {
      displaySurface: 'monitor',
    },
  });

  e.target.disabled = true;

  screenVideoStream = stream;

  if (camVideoStream) {
    startRecording();
  }
});

document.querySelector('#startCam').addEventListener('click', async (e) => {
  const stream = await navigator.mediaDevices.getUserMedia({
    video: {
      width: {
        min: 1280,
      },
      height: {
        min: 720,
      },
    },
  });

  e.target.disabled = true;

  camVideoStream = stream;

  if (screenVideoStream) {
    startRecording();
  }
});

Now, we can create the startRecording function.

function startRecording() {
  const screenVideo = document.createElement('video');
  screenVideo.srcObject = screenVideoStream;

  // Getting the dimensions of the stream
  const screenVideoStreamWidth = screenVideoStream
    .getVideoTracks()[0]
    .getSettings().width;
  const screenVideoStreamHeight = screenVideoStream
    .getVideoTracks()[0]
    .getSettings().height;

  const camVideo = document.createElement('video');
  camVideo.srcObject = camVideoStream;

  // Getting the dimension of the stream
  const camVideoStreamWidth = camVideoStream
    .getVideoTracks()[0]
    .getSettings().width;
  const camVideoStreamHeight = camVideoStream
    .getVideoTracks()[0]
    .getSettings().height;

  screenVideo.play();
  camVideo.play();

  // Calculating the dimensions of our canvas where we'll place the videos side by side
  let width = camVideoStreamWidth + screenVideoStreamWidth + 2,
    height = Math.max(camVideoStreamHeight, screenVideoStreamHeight);

  const canvas = document.createElement('canvas');
  canvas.width = width;
  canvas.height = height;
  const ctx = canvas.getContext('2d');

  const interval = setInterval(() => {
    // Drawing the videos at their desired positions
    // You can also put text here(like timestamp etc.) on the video(or anything else that the canvas supports)
    ctx.drawImage(
      screenVideo,
      0,
      0,
      screenVideoStreamWidth,
      screenVideoStreamHeight
    );
    ctx.drawImage(
      camVideo,
      screenVideoStreamWidth + 2,
      0,
      camVideoStreamWidth,
      camVideoStreamHeight
    );
    // 30 fps
  }, 1000 / 30);

  const recorder = new MediaRecorder(canvas.captureStream(30), options);

  recorder.ondataavailable = (e) => {
    chunks.push(e.data);
  };

  recorder.onstop = (e) => {
    const blob = new Blob(chunks, { type: 'video/webm' });
    const url = URL.createObjectURL(blob);
    const a = document.createElement('a');
    a.href = url;
    a.download = 'combined.webm';
    a.click();
  };

  // Creating chunks of 1 second each
  recorder.start(1000);

  // Stopping the recording after 10 seconds
  setTimeout(() => {
    camVideo.pause();
    screenVideo.pause();
    camVideoStream.getTracks().forEach((track) => track.stop());
    screenVideoStream.getTracks().forEach((track) => track.stop());
    clearInterval(intverval);
    recorder.stop();
  }, 10000);
}

Here's how the output looks like. (This is a single frame from the video output)

And there we have it.

This was a lot of fun to figure out and I have a ton of useful ideas that I can use this in.

I'll have more canvas adventure in the future.