So, the time has come for me to leave India and go to a foreign land (I hope it is not for ever but nevertheless, its for a substantial time). This is probably the last post I am writing before going to Berkeley. Though I will be in home till next Saturday, I believe next week will be very hectic and I shall have no time to write posts. So, what will I miss about India? Well, first thing is India itself ! Then staying in home for the past 2 months, I have again got used to good food which I guess might become a rarity once I am in US. Adjustment to a new system, new set of people, new food, new weather etc. I guess the only thing which I am accustomed to and which will also be there in US is myself. But then, changes are inevitable. So, the best thing is to not resist it and better accept it with a smile. After all, 4 years back, I did not expect that I will have any regret when I leave IITK, but when I was actually leaving IITK, I did feel sad. So, may be 4-5 years later, when I leave Berkeley, I might feel sad for Berkeley as well.
Enough of emotional stuff, lets get back to business. I always wanted to know a bit about Approximation algorithms but never had the opportunity to learn about it in too much detail. Last week, apart from thinking about some circuit complexity problems, I read about two simple but elegant approximation algorithms. The first one is a greedy algorithm for Set-Cover problem and the second one is the Goemans-Williamson SDP algorithm for MAX-CUT. The first one has a simpler description and a somewhat simpler proof, so let me explain that. But before that, let me start with Approximation algorithms. Since the seminal works of Cook, Levin and subsequently Karp, it is known that there are problems which have efficiently verifiable solutions but are not solvable in polynomial time under a widely believed assumption (namely P<>NP). Today there are literally thousands of problems (of theoretical as well as practical importance) which are known to be NP-complete i.e. unless P=NP, these problems cannot be solved in polynomial time. Hence, the corresponding optimization problems (i.e. maximization and minimization problems) also cannot be solved in polynomial time (unless again P=NP). So, we should give up the hope of getting efficient algorithms for these problems. However, it may be the case that though you cannot have an exact algorithm, you can have an efficient algorithm which returns an answer which is guaranteed to be close to the optimal value (for example, in a maximization problem, it may happen that the answer is always within a factor of 2 of the optimum). These algorithms will be of tremendous importance and which is what has given rise to the flourishing field of approximation algorithms and its complexity sibling – Hardness of approximation (and of course the PCP theorem). I will now describe a
approximation algorithm for set-cover. The algorithm is very simple though the analysis is tricky.
First, I will describe the problem. Let
be a collection of subsets of set
. Without loss of generality, we may also assume that
. Goal is to find the minimum number of subsets
such that
. In other words, find the minimum number of subsets in the collection
such that their union is
itself. For example, let
. Then the minimum set cover is
. The algorithm for this is very simple – It is a greedy algorithm.
Let
and
. Find the set
such that size of
is largest among all sets in
. Let
and
. If
covers
, then stop else start the algorithm with new
and
. So, it is a very simple greedy algorithm for set cover. It is also clear that the algorithm runs in polynomial time. Now comes the tricky part – analysis of the approximation.
The first trick is that it is rather hard to analyze the algorithm in terms of the optimal solution. Rather it is easier to analyze the optimal solution in terms of the algorithm. Let there be
iterations in the algorithm and in the
iteration, let set
be added to
. Also, just prior to adding the set
, let size of
be
. Hence, we can see that for adding
elements to the set cover, we are spending a cost of
per element. Let
denote the cost spent for element
(as described above). Now consider any arbitrary set
. Let
where the order is defined in the order in which elements are added to the set cover by the greedy algorithm with ties being broken arbitrarily. Consider the element
. Clearly, by the way the algorithm works, we can say that
. This immediately implies that
. The last sum can be bounded by
. Now let
be the optimum solution to set cover. Clearly, the optimum solution has size
. Further
. This inequality follows from the fact that
is a set cover. However, we also know that
. Here
. It is also easy to see that the way
is defined,
is the cost of the solution by the greedy algorithm (which we denote by
. Hence, by the aforementioned inequalities, we can conclude that
. Hence, the greedy algorithm is
approximation algorithm.
This is in fact the best that can be done i.e. Feige proved that unless
(It is almost as bad as P=NP), Set cover is hard to approximate to
for any
. Anyway, that is for another day. As of now, I am done. Next post from Berkeley. Till then, bye bye.