Kusto Detective Agency Season 1 (Challenge 2)#
In this challenge, we need to spot election fraud. There’s 4 candidates but a highly suspicious victory…. This challenge introdcude the bin function (binary). Finding high activity in short periods is very relevant to cyber. Is someone brute forcing after hours? Trying to DDoS? Have 500 errors spiked? Who knows… we do!
Solution#
Investigating the Data#
We’ve brought in the Votes table, select the first 1000 to take a look
Taking a look at the first 1000, should start with the low hanging fruit. Is the voter hash repeated?
9 collisions in 5 million votes, I’ll put that down to expected (and not enough for voter fraud). It could also be that someone is generating hashes randomly to brute force
How about that voter IP column?
Normally I would think we’re on to something here, but the hints tell us that votes are done via proxy in local areas. So once again this is probably expected.
Introducing Bias#
Lets introduce some bias and assume fraud did happen. In an election it’s probably not the most moral thing; but in cyber there is a whole mindset of assuming you’ve been breached.
Lets see where the votes came from, and order them by time.
OK, thats suspicious. So many votes in the first second of the polls opening, at the same address (which also happens to be the first URL). 10 seconds later it swaps to the next ip in the list and the same action repeats. Clearly something is sus here….
Calculating the fraud#
So, someone is loading as many votes as possible to a server every 10 seconds, then moving on to the next. Let’s work out the votes where there were more than 10, in a 10 second window, in a single IP. We can then count these and treat them as fake votes.
Now, there may be perfectly legitimate votes in this time that we are bypassing. In the grand scheme, these are rounding errors so I’m not too worried about this.
Time to introduce the bin function. Which does exactly that.
2.5 million votes? that looks like fraud to me.
Removing the fraud#
We can now work out how many votes each nominee had. Assuming there was no fraud for the others, they are just their own count, and we reduce the Poppy votes by the fraud amount.
We have to multiply by 100 then divide over the total to get the percentage (cant get a partial int)
I really should pass the variable through here instead of running it in two separate calculations, but I honestly had trouble here.
Final Script#
#let fraud=Votes
#| where vote == 'Poppy'
#| summarize count() by via_ip, Length=bin(Timestamp, 10s)
#| where count_ > 10
#| summarize fake=sum(count_)
#;
#fraud //2497919
#
#Votes
#| summarize Kastor=countif(vote=='Kastor'), Gaul=countif(vote=='Gaul'), Poppy=countif(vote=='Poppy'), Willie=countif(vote=='Willie')
#| project K=Kastor, G=Gaul, P=(Poppy-2497919), W=Willie
#| project K, G, P, W, Total=K+G+P+W
#| project Kastor=todouble(K)*100/Total, Gaul=todouble(G)*100/Total, Poppy=todouble(P)*100/Total, Willie=todouble(W)*100/Total
#// remember these arent in the same order as the answer request!