Kippo Log Analysis

Jose Nazario, September, 2013 @jnazario

I analyzed my Kippo SSH honeypot logs (which reside on a broadband /32 in the US) and found some interesting tidbits. The data here comes from that personal server I'm running, and about ten months of data. Processing was done using some custom scripts I wrote, one of which is available (see below).

What is Kippo?

Kippo is a medium-interaction SSH honeypot. It provides a fake (simulated) Debian Linux system to attackers to let them poke around by presenting, via an emulated Bash shell, a filesystem and such. The goal is to learn about the hackers on your system (in true honeynet style), find their tools and see what they're up to. I have yet to find a true 0-day, but that would be one of the goals of this acivity. Typically I find standard SSH brute force botnets.

Daily Activity

One of the key questions is how many attempts do we see on a daily basis? I decided a simple line graph showing connections, failed and successful, per day would be a good way to quickly judge. The Y axis is a log scale because otherwise it drowns out the successes. I see some people have tons of inbound successful connections and I wonder how? Do they use obvious passwords (see below)? Should I do that to get more data?

Anyhow, you can also see sporadic activity from this graph, and of course the time range I'm studying.

Client IP Locations

Obviously one of the types of analysis you would like to do is to look at where the clients are coming from. To produce the map below, Kippo logs were analyzed using the Team Cymru IP mapping service, a free service that can yield some basic information about IPs. Distinct IPs were mapped to counties and counts accumulated. Note that if an IP keeps on trying we count it as one distinct IP.

You can hover over individual countries to see the unique IP counts.

Origin ASNs are also interesting, they can help you spot "bad" or "unclean" ASNs pretty quickly. Some of the usual suspects are here. I expected this list to be dominated by residential and hosting farm ASNs.

Attempted Credentials

People try and get into Kippo by brute forcing their way in, that is guessing username and password combinations. This can give you insights into various things like default passwords, compromised accounts, and the like.

You can see how dead simple some of the tools are, they keep trying stupidly simple passwords. It's tempting to put the most obvious password(s) in your Kippo pot, but I worry about being too obvious.

Session Commands

As part of my analysis I decided to perform some stochastic analysis on the commands people issue in the honeypot. Basically it works like this: for every command in a sequence, figure out the probability they'll issue some specific command next. You can imagine using it to build information about compromises (e.g. shell-based anomaly detection), reactions to intruders, or improving the honeypot (for example "why do people exit?"). I think this warrants some further study. The script is avalable if you wish to explore it.

D3 work done by Andrea de Pasquale, many thanks!

Session Durations

One of my key metrics is how long I can entice an attacker to hang around the fake environment. My assumption is that a more convincing environment will keep them using it longer, which means I can get more information about them and their methods, tools, etc. By anayzing my logs, I can see that most of the sessions are very short duration, which makes me wonder: do they know it's a honeypot? Or will they come back later?

Repeat Visitors

Much to my surprise I had some repeat visitors, many in fact. I had expected to find very few, but it may be zombies that don't know not to scan me again, or it may be someone who thinks they have a known password from an automated bruter. I'm guessing that for many of them it was more than two visits that it's the former and zombies just don't know not to come back, an inefficiency on their part. The most frequent visitor came back 14 times, a host from Brazil. Several others came back far more than twice, some as many as ten times in this time period. I'm not sure if it's the same host at the IP, but all I know is the same IP came back a few times on more than one occassion. This bodes well for SSH blacklists, either locally or in a distributed fashion.

Conclusions

This study of my Kippo logs was designed to look at a key question, namely How can I get users to engage with the honeypot more? We have this mental model, informed by people like Rowe (2004), Qassrawi & Zhang (2010), and others, that a more deceptive, realistic honeypot will keep people engaged longer, yielding more information. To that end, what I was hoping to find out was what interactions people have with the honeypot, and can I figure out why they leave? Studying the effectiveness of deception by honeypots is not new, see Rowe (2006). Studies like those from Duong (2006) suggest that studying honeypots can be an effective way to study attacks, but we know that some attacks - interesting attacks - require long term setup and action on the part of the attacker, and so a realistic honeypot should help capture that data. I hope that this brief data dump helps someone further study Kippo and other honeypots to improve them to gather more data about attacks and attackers. Previous studies, such as Jordan et al. have helped to inform better decoys, hopefully someone will update that work. An alternative is from Neagoe and Bishop (2006), where inconsistencies in deception can be useful to draw out attackers. I am going to keep on playing with this honeypot, and others, and experiment along the way.

References

Rowe, Neil C. "A model of deception during cyber-attacks on information systems." Multi-Agent Security and Survivability, 2004 IEEE First Symposium on. IEEE, 2004.
Qassrawi, Mahmoud T., and Zhang Hongli. "Deception methodology in virtual Honeypots." Networks Security Wireless Communications and Trusted Computing (NSWCTC), 2010 Second International Conference on. Vol. 2. IEEE, 2010.
Rowe, Neil C. "Measuring the effectiveness of honeypot counter-counterdeception." System Sciences, 2006. HICSS'06. Proceedings of the 39th Annual Hawaii International Conference on. Vol. 6. IEEE, 2006.
Duong, Binh T. Comparisons of attacks on honeypots with those on real networks. Diss. Monterey, California. Naval Postgraduate School, 2006.
Jordan, C. J., Q. Zhang, and J. Roves. "Determining the strength of a decoy system: a paradox of deception and solicitation." Information Assurance Workshop, 2004. Proceedings from the Fifth Annual IEEE SMC. IEEE, 2004.
Neagoe, Vicentiu, and Matt Bishop. "Inconsistency in deception for defense." Proceedings of the 2006 workshop on New security paradigms. ACM, 2006.