This is the final blog post about the Thug Distributed Task Queuing Project. Here I will describe the Distributed feature that we have added to the already existing project Thug, by which now analyses of URL’s has become easy and efficient.
Previously Thug worked like a stand-alone tool and does not provide any way to distribute URL analysis tasks to different workers. For the same reason it was neither able to analyze difference in attacks to users according to their geolocation (unless it is provided a set of differently geolocated proxies to use obviously). Now after implementation of this project we are able to solve both problems by creating a centralized server which will be connected to all the Thug instances running across the globe and will distribute URLs (potentially according to geolocation analysis requirements). After that the clients will consume the tasks distributed by centralized server and will store the results in database after processing them.
On Server we are able to handle all the Clients(worker) and now we are able to distribute URL’s on the basis of clients geolocation i.e. if we want to check the working of a URL in a particular country then we can put that URL in that country and then a client connected from that country will process the URL and give back the result. So by this we are not only able to distribute URLs among clients running from all over the world but now we are also able to analyze the attacks to particular countries.
These are the working demos of flower(celery monitoring tool) to see workers processing tasks:
Workers connected from India and processing tasks:
Tasks description which are running or completed:
Workers are the clients or Thug Instances running from all over the world. They are connected to 2 types of queues: Generic queue and its nation queue(like Indian client would be connected to India queue, so on). Now whenever server puts up URLs in the queues workers connected to that queues consumes the URLs and after processing them sends back the results to server for further processing.
Here I want to describe about the optimizations on which I worked and currently working on. I made 2 other prototypes in which I tried to do some optimizations and currently also reside in Github repo. In 1st prototype I tried to distribute URLs according to clients system performance i.e. if a clients system is super fast so we will give him more URLs as compared to others. This was done using Redis DB, worker will calculate a performance value in Redis Sorted Set after every 2 min.(example) and then whenever Server wants to distribute URLs it will query the Redis Sorted Set and will allocate URLs to clients having more system performance value(as better system performance means better system). So by this we might be able to get the quicker response from the clients, but here a problem occurred i.e. we were facing difficulty related to distributing URLs according to geolocation.
2nd prototype optimization was very simple as we just increased the Prefetch Value of systems having better system performance value, so the clients whose systems are better than other will process more URLs than others as they will prefetch more URLs than others.
That’s all I wanted to share about my Project. But in total, this was super exciting summer and I liked & learned a lot by participating in GSoC.
I want to thank everyone who helped me in completing my project:
1st and most important is Angelo Sir(mentor) who helped me a lot in his busy times also, he answered my each and every dumb query. Thanks a lot sir, really he is an amazing guy
Then I want to thank Sebastian Sir(backup mentor) and Kevin Sir. I did some great discussions with Sebastian sir which helped me a lot in doing project and Kevin sir worked as an unofficial mentor as he helped me a lot in working with Celery plus he advised me a lot while implementing the project.
I also want to thank David sir for organizing & managing the Honeynet GSoC so well and I would also like to thank Tan Kean Siong sir for starting a Introduction mailing list for giving students a platform to introduce themselves.
Let’s always keep on working!
ThugD github repo can be find at https://github.com/Aki92/Thug-Distributed.
More details & documentation about project can be find at http://aki92.github.io/Thug-Distributed/.