PhD student Honglin Yu.

More than one billion people currently use YouTube, and 300 hours of video are uploaded to the video-sharing website every minute.

In this digital age there's a constant need to better understand social media, how it evolves and the factors and benefits of using tools like Twitter, Facebook, Instagram and YouTube.

NCI's supercomputer, Raijin, is helping researchers from The Australian National University to understand the lifecycle of YouTube videos - and predict their popularity.

In collaboration with technology research centre NICTA, Dr Lexing Xie, PhD student Honglin Yu and Dr Scott Sanner have been measuring the YouTube environment for the past four years.

Xie and her team have collected hundreds and thousands of tweets containing YouTube URLs and have used Raijin to process that data.

"The information we collect includes how many views the videos get every day, but one of the difficulties in big data is to work out what data to look at," she says.

"We take Twitter discussions of YouTube videos that point us to the URL, and, if the owners have enabled it, we can get the video's history which is the information we need."

The work is pioneering a new area of science and could lead to commercial and practical benefits for business.

"Traditionally in computer science we understand computer systems, but this research is part of a new area called computational social science where we try to understand how people behave using computational means," says Xie.

"In a real life social network we know that the aggregate behaviour is not a simple reflection of how likely one particular person will behave, because people influence each other and react to things differently.

"Once we understand the aggregate behaviour over a period of time, we hope to understand how people react to media campaigns. Organisations will then be able to understand how to spread the word about something or how to prevent something from getting out.

"In the coming years the hope is to understand how popularity evolves and to nail down all the predictable factors."

Xie says having access to the NCI facilities has given her team the ability to undertake such a data-intensive project.

"We use special algorithms for hundreds of thousands of videos, so it's a job that's quite suitable for NCI.

"If we didn't have the NCI facilities it could take 3-6 months just to finish the computing and we run that part of the research multiple times, so it could take almost a year just to process the data.

"We're very grateful to have a resource like NCI," Xie said.

Honglin Yu will present the paper at the 9th International Conference on Web and Social Media at Oxford University.

Download the research paper

This article was first published in ANU Reporter.