Exploring the Twitch network

This work looks at exploring and visualizing the Twitch network of broadcasters and followers. Data is obtained using the Twitch API and a graph of broadcasters and followers is created. The preferential attachment model is applied to the network to predict the probability of a new viewer observing any given broadcaster.

The data for this project was obtained using the twitch api. The twitch database is immense and it is impractical (due to size and rate limits) to obtain a complete snapshot of the data. To this end aggregate data was obtained which is representative of the network when the data was obtained. Additionally the twitch network is actively growing the data presented here will not be 100% representative for the network as it stands today. These two facts introduce errors which will be represented in the results as unexplained variance.

For this work data was obtained on broadcasters, here the data included the people following the broadcasters channel and the channels the broadcaster was following. A visualisation of a subset of this data is given below.

Using this data we can model the growth of the twitch network over time using the Preferential attachment model (ref barabasi). In general, preferential attachment means that the more connected a node is in a network the more likely it is to gain new edges with incoming nodes. This can be defined formally by the equation

$p_i = \dfrac{k_i}{\sum_j k_j}$

Here $p_i$ represents the probability that a new node introduced to the network will connect to node i, $k_i$ is the degree of node i, and the sum is over all nodes. Explain this more, reference a paper. Talk about how scale free models are used to reference social networks. Explain scale free models and why they are good for social networks. Reference a paper.

Visualizing the follow counts we can see that there is a large disparity of followers with a few having several and many having little. #add a histogram with visuals.

We can further illustrate this by preforming a log log transformation on the data. #show a log log plot

Here we can more clearly see the downward trend of the data. Modeling this data as a power law we get the following parameters with the following confidence... #show the log log plot with a fitted line.

While this is not definitive proof of a power law it indicates a power law may be present. More explanation of why this is, point to lack of data, show more conclusions. Additional unobserved factors such as stream quality, streamer ethenacity, location, language, gender, timezone etc. These factions would contribute to the error of the model above.

Applying the above model to the data above we can visualise the growth of the twitch network over time. The visualisation of this growth is given below. Here 100 nodes are added to the graph, representing 100 new users to the Twitch platform.