"Real or Junk?" will be back soon

08 August 2022, by Adam McMaster

In November I wrote that we had added a large batch of 400,000 subjects to the Real or junk? workflow on SuperWASP Variable Stars. I'm pleased to report that all of those subjects have now been completed!

It's been about a year and a half since we launched the Real or junk? workflow, and in that time we've managed to filter over 1.1 million light curves, narrowing down the set to about 400,000 real subjects which don't look like random noise. This is amazing progress! This smaller set of subjects then feeds into the main light curve classification workflow, speeding up our search for variable stars.

We're not done yet, though. There is still plenty of data left to filter. But, we now have enough light curves which have been labelled as either real or junk that we can train a computer to identify them automatically. SuperWASP Variable Stars team member Hugh Dickinson has already built a computer model which can do that, and we've been testing it out in recent weeks.

The computer model is pretty good at classifying light curves and most of the time it can correctly separate real data from junk. However, since the data is pretty noisy we can't rely on this alone, as it does sometimes make mistakes and we don't want to miss anything.

That brings us to what's next for Real or junk? – we're going to put it on pause for a few weeks, and when we bring it back it will be powered by Hugh's computer model. We'll classify everything with the computer model first, and anything it identifies as real will be sent to the main workflow to be classified in full. For anything it identifies as junk, we'll ask you to give us one manual classification (rather than the three we ask for at the moment) and if you think it's real (i.e. if you disagree with the computer) we'll send it to be classified in the main workflow.

This way we can maintain the high level of accuracy we've had until now, but without needing as much work from our volunteers. We wouldn't want to waste your time by asking you to do something a computer can do. This also means we will be able to filter out the rest of the junk much more quickly, helping us to find interesting new variable stars even faster!


