Technical real time (audio) analysis directly in browser

Motivation

While experimenting with a gamma spectrometer I found a bunch of references to people who used their sound card to analyze the pulses form various radiation detectors. One popular example is PRA which is a free PC application that implements a multichannel analyzer using the sound input.

During my previous experiment I used tools like Audacity to record pulses from my scintillation detector just to get an idea of what the signal looked like over time.

I thought it would be interesting to see if the Web Audio API that is supported by many browsers could be used for this (and other) kind of data collection directly in the browser using just JavaScript and HTML5.

This would open up for a whole host of applications that would be inherently open source, and platform independent.

What kind of data do you get

The sound card I used for this experiment was the guts from a Logitech USB headset where I just soldered the input to the microphone input:

Sound card out of an old headset

According to my operating system this device runs at 48kHz sample rate internally so I used this sample rate for the experiments.

 

Using my Siglent SDG6022X signal generator I generated some pulses that closely resembles those from a photomultiplier tube:

 The pulses are sharp negative spikes with a duration of about 300µs. I made the pulses repeat at a rate of 100 per second.

I took a look at a Audacity recording of these pulses:

100 pulses per second recorded at 48kHz

I saw that the amplitude of the recorded pulses varied wildly over the first 10 seconds and then settled in a sort of oscillating pattern after that.

I have encountered this before but have always attributed it to some problem with Audacity and never thought more of this. The pulses also didn’t look anything like the original:

Pulse is squashed with post ringing

If one were to e.g. to very accurately detect the amplitude of these pulses this would have been a total show stopper right there.

Initially the data is not raw

I looked into the problem and discovered that my operating system (Windows 11) per default is applying “sound enhancement” in the form of noise- and echo cancellation to microphone inputs. This is to “enhance your teleconferencing experience” but it totally screws up the integrity of technical signals like this. The pulses sounded like a AC with a bad fan and got filtered out.

Fortunately this feature can be disabled in the audio settings:

I believe this can be done on other operating systems as well.

After turning this “feature” off I got much more consistent recordings:

Recording after sound enhancement was turned off

The amplitude went up by a lot without the enhancement filter so the microphone input volume needed to be turned way down for the same signal.

Pulse with no enhancement

The pulses now looked much more like the original with just a slight tail. The tail is an artifact part from some input capacitor in the sound card being charged up by the pulse and part by the sampling process itself. This is consistent and does not pose a immediate problem.

I still observed amplitude oscillation however and this is also an effect of the sampling process. When pulses arrive they do so at a random time compared to when samples are taken:

Two pulses appearing at different time with respect to the samples

Above are two pulses compared. They are taken from different part of the recording and represent two extreme cases. If you were to just take the tallest sample (lowest in this case) of each pulse you would get a much different result for the pulse amplitude.

According to the Nyquist–Shannon_sampling_theorem the exact same- and complete information is preserved in both pulses but just shifted in time with respect to the time the samples were taken.

So how do you fix this? Surely it is not possible to shift the signal with a fraction of the time between samples?! Turns out it is.

Split Unit Sample Delay

By applying a special filter it is possible to shift the sampled signal by any fraction of a sample delay. This technique is widely used in signal processing for things like interpolation and sample rate conversion. You can read a practical guide in this article.

To explore this in this context (trying to get the correct amplitude of pulses) I made this demo:

Move the delay slider to get one of the red samples to the maximum (lowest) value.

The input signal is actual data snipped from the Audacity recording. The dashed grey lines are calculated by progressively varying the delay between zero and one sample to get the intermediate sample values.

I set out to make an algorithm that use this technique to accurately evaluate the amplitude of such pulses. More about that later.

Getting audio input into the browser code

Many examples describe how you can manipulate audio with code in your browser. Most are concerned with routing sound around between various inputs, outputs and media files and applying various effects and filters.  I wanted access to the actual samples in real time as they came into JavaScript code running in my browser.

I won’t go into too many details but it turns out this is how it is done:

The jest is that you send some (JavaScript) source code to the API that then runs it in some safe environment (that you have limited access to). The code can send and receive buffers to/from audio devices as well as do your own stuff to them.

Some limitations due to (browser) safety

The main JS code (running on the page you load in the browser) can only communicate with the protected code via. a tightly controlled message channel.

For safety reasons the code you supply MUST originate from a safe origin e.g. localhost (webserver your local computer) or a https connection. A HTML file on the hard drive of your computer is apparently not considered a safe origin. Even the audio being processed by the protected code is is  subject to these restrictions. You can not play e.g. an audio clip and get the sound processed unless this also originate from a safe place.

Since I like to make my projects as neatly packaged single .htm files (I have my reasons) this gave me some problems. I eventually did find a way around this with some caveats though.  If you run a .htm file in your browser from your hard drive you can get access to physical audio in- and outputs but you can not choose what specific input and output to use. This is chosen globally by your operating system. This also applies to speaker volume and microphone gain control.

Barebone example

This is the simplest code I have been able to come up with that runs complete within a single .htm file and are not dependent on a local or global webserver:

 

<html>
	<button id="mikeButton" onclick="mikeControl()">MIKE ON</button>
<body>
<script>
	moduleScript = `
	class DZL_AudioRenderer extends AudioWorkletProcessor {
	 	constructor() {
	    	super();
	    	this.running=true;
			this.port.onmessage = (e) => {
				if(e.data=="stop")
				{
					this.running=false;	
		      		this.port.postMessage("Stopping");
				}
	    	};    	
	  	}
	  	process(inputList, outputList, parameters) {  
	  	//This is where the magic happen
	  	this.port.postMessage({"input":[	]});
	  	return this.running;
	  }
	}
	registerProcessor("DZL_AudioSource", DZL_AudioRenderer);
	`
	var audioContext;
	var AudioRenderer;
	var mikeON=false;
	async function mikeControl()
	{
		if(mikeON)
		{
			AudioRenderer.port.postMessage("stop")
			document.getElementById("mikeButton").innerHTML="MIKE ON";
			mikeON=false;
		}
		else
		{
			if(!audioContext)
			{
				//	First time we press the ON button we send the renderer code to API
				//	This need to be after the user ask for it.
				audioContext=new window.AudioContext({sampleRate: 48000});
				var blob = new Blob([moduleScript], {type: 'application/javascript'});
  				var reader = new FileReader();
  				reader.onloadend = async ()=>{
					await audioContext.audioWorklet.addModule(reader.result);
					if(await startMike())
					{
						document.getElementById("mikeButton").innerHTML="MIKE OFF";
						mikeON=true;							
					}
				}
  				reader.readAsDataURL(blob);
			}
			else
			{
				if(await startMike())
				{
					document.getElementById("mikeButton").innerHTML="MIKE OFF";
					mikeON=true;							
				}
			}
		}
	}
	
	async function startMike()
	{
		//	We set up a renderer that just redirect buffers to our DSP
		AudioRenderer = new AudioWorkletNode(audioContext, 'DZL_AudioSource');
		AudioRenderer.port.onmessage=(m)=>
		{
			//Data back from our rendere module
			if(m.data["input"])
			{
				buffers=m.data["input"][0];
				if(buffers[0].length==0)
					return;
				DSP(buffers[0]);
			}
		};
		//	Then we ask for the available microphone input
		const constraints = {
				audio:{
					noiseSuppression: false,
					echoCancellation: false, // Optional: Enable echo cancellation
					autoGainControl:false,
				},
				video: false,
			};
		//	Finally we attach it to our renderer
		navigator.mediaDevices.getUserMedia(constraints).then((stream) => {
			const microphone = audioContext.createMediaStreamSource(stream);
			// `microphone` can now act like any other AudioNode
			microphone.connect(AudioRenderer);
		}).catch((err) => {
			return false;
		});
		return true;
	}

	function DSP(buffers)
	{
		buffers[0].forEach((sample)=>{
			//******************************
			//	Individual 
			//	samples available 
			//	here:
			//******************************
		});
	}
</script>
<body>
</html>

In the bottom samples from the microphone are available do be processed by whatever JavaScript code you may want to employ. 

NOTE: I found out that it is possible to have the audio API turn off noise and echo cancellation and automatic gain control. This way the user does not have to interact with the operating system. (above that happen between line 83 and 90)

The following example is based on this.

Example: multichannel analyzer

The purpose of a multichannel analyzer is to determine the amplitude of incoming pulses, like the ones described earlier,  and produce a histogram for the various pulse amplitudes. The following assume some kind of detector is connected to the audio input. The detector will emit randomly timed pulses which amplitudes represents the energy for each pulse. A row of counters: referred to as channels or bins, are assigned to a range of pulse heights and are counted up for each pulse that falls within this range. The counters (histogram) form a spectrum that gives an indication of the nature of the stuff that the detector is measuring.

How the example works:

When pressing the audio ON button the microphone (or line in) is configured to start taking data.

The data is received in the form of sample buffers each of 128 float values. These buffers are added to an array until it contains 10 buffers. Hereafter the first buffer in the array is removed upon arrival of successive buffer. This array of 10 buffers  are continuously searched for any sample that surpasses the Trigger level. Some amount of samples before and after this point is transferred to a separate buffer and  sent off to be processed as ‘an event’.

Each event is run through a interpolation filter that inserts 4 new samples between each original pair of samples using the delay method described earlier. This new more detailed event is then searched for the sample with the largest amplitude. This ensures that the peak of each pulse is captured more correctly.

Each pulse can be monitored by checking the Show pulses checkbox:

Red represents the original samples, blue the interpolated pulse, Black line is the amplitude measured for this pulse.

This also allow the user to adjust the gain of the audio input (from the operating systems controls).

After each pulse height is measured it is added to a histogram of pulse amplitudes.

Note: The microphone volume determines directly what pulse amplitude goes in which bin of the histogram.

When data collection is started the start time time is stored.

At first a background histogram is collected by letting the detector run for long enough to give a good representation of the background(radiation).

Then the Store reference button is pressed. This copies the histogram to a reference histogram and the time since start is recorded.

A this time a sample can be placed in front of the detector.

When the Use(histogram) mark is checked the reference histogram, scaled by the time since is was stored, is subtracted from the current histogram allowing the background to be continuously subtracted. Smoothing can be applied to the reference histogram by adjusting the Smoothing slider.

If the (measurement) clear button is pressed the reference histogram is transferred to the main histogram effectively removing the contribution from the sample.

Working demo:

Click her if it does not work

Conclusion

It turns out that you are able to collect largely unaltered real time audio data to be processed by JavaScript code in a regular web browser. The speed at which contemporary JS is executed in a standard and unmodified browser is also really impressive.

The ability to disable interfering sound enhancement features directly from the Audio API is a really important feature since these severely limit the usefulness of the data for technical purposes.

This opens up possibilities for creating lightweight and easily adaptable software that are inherently open source and platform independent.

I imagine this kind of approach can play a major role in empowering citizens science projects since  advanced custom software can now be deployed on a wide range of computers without the users having administrative access or even access to basic system configuration.

To many students even being allowed to connect a simple Arduino to their student computer can be problematic or sometimes impossible.  This severely limits their ability to partake in distributed large scale data collection efforts.

Future work

I would like to develop a range of easy to build sensors that can be connected and processed using these kind of non system specific methods.