Friday, January 4, 2013

Topological Sort and Strongly Connected Components

Topological Sorting and finding Strongly Connected Components are classical problems on Directed Graphs. Here we will see how we can do Topological Sorting by using DFS and Find Strongly Connected Components using Kosaraju's Algorithm.
Topological Sorting Algorithm:
1) Start with any node and perform a DFS on the graph marking visited nodes.
2) If a node has no unvisited neighbours, add it to the front of a linked list.
3) Repeat till all nodes are visited.
Instead of using a linked list we just push_back normally into a C++ vector. The final vector will give reverse of the topologically sorted order.
Strongly Connected Components Algorithm:
1) Do Topological Sort.
2) Find transpose of the graph: It is defined as - if (u,v) is an edge in original graph, (v,u) is an edge in the transpose
3) Do a DFS on the transpose of the graph in topologically sorted order.
4) Each vertex of the so formed depth-first forest is a separate strongly connected component. Mark them, count the number of them etc.
vector<int> a[1005],atr[1005],topo;
bool seen[1005];
int comp[1005],indeg[1005];
int n,c;

void dfs(int node) // Topological Sorting
{
 seen[node]=true;
 REP(i,0,a[node].size())
 {
  if(seen[a[node][i]]==false)
   dfs(a[node][i]);
 }
 topo.push_back(node);
}

void dfst(int node)
{
 seen[node]=true;
 comp[node]=c;
 REP(i,0,atr[node].size())
 {
  if(seen[atr[node][i]]==false)
   dfst(atr[node][i]);
 }
}

int SCC()
{
 int result=0;
 memset(seen, false, sizeof seen);
 REP(i,1,n+1)
 {
  if(seen[i]==false)
   dfs(i);
 }
 memset(seen,false,sizeof seen);
 memset(comp,0,sizeof comp);
 c=0;
 for(int i=topo.size()-1;i>=0;i--)
 {
  if(seen[topo[i]]==false)
   {dfst(topo[i]);c++;}
 }
 return c;
}
Practice Problem:
CAPCITY

Kruskal's Algorithm for Minimum Spanning Tree

Kruskal's Algorithm is a Greedy algorithm which calculates the Minimum Spanning Tree of an undirected graph. It works as follows:
1) Sort the edges according to their weight.
2) Choose the edge with minimum weight
3) Take the next minimum edge - if it does not form a cycle with the existing edges in the MST, add it to the MST. Else, go to the next edge and repeat this step.
To check whether the next edge does not form a cycle with the existing edges, we use a Data Structure called Union-Find or Disjoint Set Data Structure. This can be implemented using arrays. To keep complexity minimum, we use Union-By-Rank and Path Compression.
Hence overall complexity of the algorithm is O(ElogE) for sorting + O(E) for the Disjoint-Set Data Structure operations = Overall O(ElogE).
#define MAXN 10000 // number of edges

struct node
{
    int x;
    int y;
    int w;
}e[MAXN];

int id[MAXN],sz[MAXN];

int root(int i)
{
    while(i!=id[i])
    {
        id[i]=id[id[i]];
        i=id[i];
    }
    return i;
}

void qunion(int p, int q)
{
    int i = root(p), j=root(q);
    if(sz[i] < sz[j]) {id[i]=j; sz[j]+=sz[i];}
    else                {id[j]=id[i]; sz[i]+=sz[j];}
}

bool fun(const node &a, const node &b)
{
    return a.w < b.w;
}

int main()
{
    int t,n,u,v,w,ans,k,adj;
    n=GI; k=0; ans=0;
    // Take input of edges, their weights and set id[i]=i, sz[i]=1 for each vertex
    REP(i,0,n)
    {
      e[k].x=GI; e[k].y=GI; e[k].w=GI;
      id[e[k].x]=e[k].x; id[e[k].y]=e[k].y; sz[e[k].x]=1; sz[e[k].y]=1;
      k++;
    }
    sort(e,e+k,fun);
    REP(i,0,k)
    {
      if(root(e[i].x)!=root(e[i].y))
      {
        qunion(e[i].x,e[i].y);
        ans+=e[i].w;
      }
    }
 printf("%d\n",ans);
 return 0;
}
Practice Problem:
BLINNET

Sieve Of Eratosthenes – Primality testing for numbers in a range

If u want to generate a list of all the primes in the range "1 to n" then a very simple to code algorithm which u can use is Sieve Of Eratosthenes.

The sieve algorithm explained in simple words is:
1) Mark all numbers in the range as prime.
2) Mark 1 to be trivially not prime
3) Mark even numbers (except two) to be trivially not prime.
4) Start from i = 3 (next number which is marked true) and mark all its multiples from i*i to n as not prime.
5) Continue no. 4 for all i till sqrt(n).

bool flag[n+1];
memset(flag, true, sizeof flag);
flag[1] = false;
for(int i=4; i <= n; i+=2)
 flag[i] = false;
for(int i=3; i <= sqrt(n); i+=2)
{
  if(flag[i])
    for(int j=i*i; j<=n; j += i)
      flag[j] = false;
}
Complexity of the algorithm is O(n log log n) in terms of arithmetic operations.

Now, you can use this base table to solve a wide variety of problems involving prime numbers.

Practice Problem:
PRIME1

ex-Opti: An Expense Management System.

eX-Opti is a multi-platform expense management system built on the Cloud and hosted on Google App Engine.It can be accessed either through a PC or a mobile web browser (www.ex-opti.appspot.com) or through telephony (Call +91-4466949325). It was built as a part of Desert Hack, a 2 day hackathon. It was also my first Cloud based App venture.
Some of its features are:
  • Add expense entries feature-wise.
  • View details and analytics of past spendings, over a custom range of days.
  • View graphs of category-wise spending.
  • A mobile call can be used to update the spendings (for the large user base with no mobile internet in India) - using DTMF.
Technical details of the app:
  • Built using Google App Engine, Python, JavaScript and HTML
  • Site is optimized for mobile - using jQuery Mobile.
  • Uses Kookoo API for telephony support and Google Charts API for generating charts of Analytics.
U can watch the App Demo which we made here:

The project is now hosted open source on Github. Feel free to fork/clone it and develop it further.

Competition Programming Template

In Competition Programming, where speed of solving problems matter, it is good to create a template of useful libraries, commonly used statements like for loops simplified using #define and some typedefs for things like long long, pair<int,int> etc. These make ur coding speed faster as u often have lesser code to type and the resulting code becomes more readable as well. However overuse of such templates might spoil your coding style and make your code unreadable by anyone else. So be wary of their use.
Here is my programming template in C++:
# include <iostream>
# include <fstream>
# include <sstream>
# include <algorithm>
# include <cstdio>
# include <cmath>
# include <numeric>
# include <cstdlib>
# include <cstring>
# include <vector>
# include <list>
# include <set>
# include <map>
# include <stack>
# include <queue>
# include <cctype>
# include <climits>
# include <complex>

using namespace std;

typedef long long LL;
typedef unsigned long long ULL;
typedef pair<int,int> PII;
typedef vector<int> VI;
typedef vector<VI> VVI;
typedef pair<int,PII> TRI;
typedef vector<string> VS;

#define GI ({int t;scanf("%d",&t);t;})
#define REP(i,a,b) for(int i=a;i<b;i++)
#define FOR(i,n) REP(i,0,n)
#define ALL(v) (v).begin(),(v).end()
#define TR(i,x) for(typeof(x.begin()) i=x.begin();i!=x.end();i++)
#define bitcount(x) __builtin_popcount(x)
#define pb push_back
#define mp make_pair
#define mt(a,b,c) mp(a,mp(b,c))
#define EPS (double)(1e-9)
#define INF 1000000000
#define MOD 1000000007
#define PI (double)(3.141592653589793)

inline int ni()
{
 register int r=0,c;
 for(c=getchar_unlocked();c<=32;c=getchar_unlocked());
 if(c=='-')
  return -ni();
 for(;c>32;r=(r<<1)+(r<<3)+c-'0',c=getchar_unlocked());
 return r;
}

int main()
{
 int t;
 t = ni();
 while(t--) {
  
 }
 return 0;
}
If you are use windows while coding, u might also find it useful to use:
freopen("input.txt","r",stdin);
freopen("output.txt","w",stdout);
to test your code.

Generating all subsets of a set

Sometimes, u need to enumerate all possible subsets of a size n as a sub-problem of some bigger problem. This can be done using bitmasks (ie) using bits of an integer to represent whether the element is there in the current subset(bit is 'set' or 1) or it is not in the current subset(bit is 'not set' or 0). C++ code with explanation follows:
int n=10; // size of set
int n2=1<<n; // set of size n has 2^n subsets
for(int i=1;i<n2;i++)
{
 mask = i;
 pos=n-1;
 while(mask>0)
 {
  if((mask&1)==1)
  {
  // Element at pos belongs to the current subset.
  }
  mask>>=1;
  pos--;
 }
}

ICPC Amrita Online Contest 2012 -- Editorials

This is my first attempt at writing an editorial, so please correct me if there are any errors and suggest improvements.
General Comments about the Contest:
There were 5 problems in all. I felt that the problems were arranged in the increasing order of difficulty (as can also be seen by the final rank list). The conduction of the contest was not so good as the server crashed multiple times throughout the competition.
U can download the complete Problem Set here: icpcquestions.docx
For solutions to the problems, i have used my C++ Competition Programming Template which u can find here. I have not included them in the solutions posted here for the sake of simplicity.
The Problems:
A: Pair Socks
Given a string containing a series of characters 'R', 'G', 'B' and 'W' representing red, green, blue and white socks, find if each of these can be grouped in pairs of same colour.
Solution:
A very trivial problem. Count the number of occurrences of each character. If all of the counts are divisible by 2, then it is possible - print 'YES'. Otherwise print 'NO'.
int main()
{
 int t,r=0,g=0,b=0,w=0;
 string s;
 t=GI;
 while(t--)
 {
  r=0;g=0;b=0;w=0;
  cin>>s;
  REP(i,0,s.size())
  {
   if(s[i]=='R') r+=1;
   if(s[i]=='G') g+=1;
   if(s[i]=='B') b+=1;
   if(s[i]=='W') w+=1;
  }
  if(r%2==0 && g%2==0 && b%2==0 && w%2==0)
   printf("YES\n");
  else
   printf("NO\n");
 }
 return 0;
}
B: Sticks
You have N sticks - the length of the ith stick being L[i]. You also have M 3-dimensional boxes - the length, breadth and height of the jth box being A[j], B[j] and C[j]. What is the maximum number of sticks which can fit in the boxes? A stick can be fit in a box if it can be placed within it without bending or cutting it. A box can hold any number of fitting sicks (assume that the sticks do not intersect each other).
Solution:
Again a trivial problem, but had a small trick in the implementation. Since a box can hold any number of fitting sticks, we can just take the largest box and check whether we can put each of the sticks into it oriented across the body diagonal (ie) from (0,0,0) to (A[i],B[i],C[i]). The trick in this problem is that A[i]*A[i] + B[i]*B[i] + C[i]*C[i] would overflow if u were using integer values. So if u declare them as long/long long u will be fine.
int main()
{
 int t, n, m, cnt;
 LL l[100005],x,y,z,val;
 t=GI;
 while(t--)
 {
  cnt=0;val=0;
  n=GI; m=GI;
  REP(i,0,n)
  {
   l[i]=GI;
  }
  REP(i,0,m)
  {
   x=GI; y=GI; z=GI;
   val = max(val,x*x+y*y+z*z);
  }
  REP(i,0,n)
  {
   if(l[i]*l[i]<=val)
    cnt++;
  }
  printf("%d\n",cnt);
 }
 return 0;
}
C: Sum Range

You are given N numbers a[1..N] and 2 other integers L and H. Count the number of tuples (i,j,k) satisfying i < j < k and L <= a[i] + a[j] + a[k] <= H.
Solution:
Probably the trickiest problem of them all (not toughest), both to implement as well as to understand why the complexity of the solution is O(n^2) inspite of having three nested for loops.
Firstly, we can see that order of the original array does not matter (since even if the order is changed, the three elements can be rearranged to match their order in the original array so that the i<j<k condition is satisfied). So we sort the original array. Now, we can reduce the problem statement to iterating across all possible i and k (ie) from left and right ends of the array and finding the number of possible j indices for each of them. This might sound like an O(n^3) solution, but we avoid that by maintaining the lowest  value j can take and the highest value j can take - this reduces the complexity to O(n^2).
A simpler solution to this problem in O(N^2logN) is also possible using binary search, which also passed the time limit.
int main()
{
 int t, n, l, h, a[1005],j,p;
 LL cnt;
 t=GI;
 while(t--)
 {
  cnt=0;
  n=GI; l=GI; h=GI;
  REP(i,0,n)
   a[i]=GI;
  sort(a,a+n);
  REP(i,0,n-2)
  {
   j=i+1;
   p=i+1;
   for(int k=n-1;k>i+1;k--)
   {
    for(;j<k;j++)
    {
     if(a[i]+a[j]+a[k]>=l)
      break;
    }
    for(;p<k;p++)
    {
     if(a[i]+a[p]+a[k]>h)
      break;
    }
    if(p>i+1)
     p--;
    if(a[i]+a[j]+a[k]>=l && a[i]+a[p]+a[k]<=h && j<=p && j>i)
     cnt+=(p-j+1);
   }
  }
  printf("%lld\n",cnt);
 }
 return 0;
}
D: Team Division

Given a list of teams and their perceived rival team list, find the number of ways that u can divide them into two groups such that no team is in the same group as its rival team.
Solution:
There was a small confusion in the statement "The rivalry relation is not symmetric". It means that in the given input data, a is a rival of b doesn't mean that b will be a rival of a. However, when we assign teams we need to make sure that if either a is a perceived rival of b or b is a perceived rival of a, both shouldn't be in the same site. This can be handled by simply adding a to b's list of rivals and b to a's list of rivals if either one perceives the other as a rival.
Then, we can perform a DFS on all the nodes to separate them into bipartite components (ie) components in which one group should be in a different site than the other. The answer would then be 2^(number of components). When performing the DFS, if we see that such a bipartite division is not possible, then no division is possible. So we can output '0' in that case.
int a[20005],b[20005],cnt,num,val;
pair<int,int> top;
set<int> v[20005];
bool seen[20005];

LL mod_exp(int cnt)
{
 if(cnt==1)
  return 2;
 if(cnt%2==0)
 {
  LL temp=mod_exp(cnt/2);
  return (temp*temp)%MOD;
 }
 else
 {
  LL temp=mod_exp(cnt-1);
  return (2*temp)%MOD;
 }
}

int main()
{
 int t, n;
 t=GI;
 while(t--)
 {
  memset(a,-1,sizeof a);
  memset(b,-1,sizeof b);
  memset(seen,false,sizeof seen);
  cnt=0;
  n=GI;
  REP(i,0,n)
   v[i].clear();
  REP(i,0,n)
  {
   num=GI;
   REP(j,0,num)
   {
    val=GI;
    if(v[i].find(val)==v[i].end())
     v[i].insert(val);
    if(v[val].find(i)==v[val].end())
     v[val].insert(i);
   }
  }

  REP(i,0,n)
  {
   if(seen[i]) continue;
   seen[i]=true;
   cnt++;
   a[i]=cnt;
   stack<pair<int,int> >s;
   TR(it,v[i])
   {
    s.push(mp(*it,1));
   }
   while(!s.empty())
   {
    top=s.top();s.pop();
    if(seen[top.first]) continue;
    seen[top.first]=true;
    TR(it,v[top.first])
    {
     if(seen[*it]) continue;

     if(top.second==0)
     {
      if(a[*it]!=-1)
      {
       printf("0\n");
       goto next;
      }
      s.push(mp(*it,1));
      b[*it]=cnt;
     }
     if(top.second==1)
     {
      if(b[*it]!=-1)
      {
       printf("0\n");
       goto next;
      }
      s.push(mp(*it,0));
      a[*it]=cnt;
     }
    }
   }
  }
  printf("%lld\n",mod_exp(cnt)); // second time
  next:;
 }
 return 0;
}
E: Sequence

Given a sequence S of '0' and '1' of length N, a subsequence of S is obtained by omitting some elements of S without changing the order of the remaining elements of S.
Given a subarray of a sequence S starting from some position 'a' and ending at some position 'b' (1 <= a <= b <= N), both inclusive, denoted by S(a,b), you have report the length of longest non-decreasing subsequence of S(a,b). By non-decreasing, we mean that a '0' never appears after a '1' in the subsequence.
1<=Number of elements in the array<=10000, 1<=Number of queries<=100000.
Solution:
This problem required the use of a data structure called a segment tree. A segment tree is a data structure which allows u to do range queries like maximum, sum, frequency etc. in an interval in O(logN) time with O(N) pre-processing. Read the description of how to construct this here - http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=lowestCommonAncestor
Now, the solution of the problem.
Maintain three things in the segment tree - a, z and o.
a - length of longest subsequence in an interval
z - number of zeros in an interval
o - number of ones in an interval
We can combine the segments as follows during initialization:
a[n] = max(a[2*n] + o[2*n+1], z[2*n] + a[2*n+1])
z[n] = z[2*n] + z[2*n+1]
o[n] = o[2*n] + o[2*n+1]
Similar query function.
#define MAXN 1000000
struct rsq
{
 int a;
 int o;
 int z;
}tree[MAXN];
string s;

void init(int n, int b, int e)
{
 if(b==e)
 {
  if(s[b]=='0') tree[n].a=tree[n].z=1,tree[n].o=0;
  else  tree[n].a=tree[n].o=1,tree[n].z=0;
 }
 else
 {
  init(2*n,b,(b+e)/2);
  init(2*n+1,(b+e)/2+1,e);
  tree[n].o = tree[2*n].o + tree[2*n+1].o;
  tree[n].z = tree[2*n].z + tree[2*n+1].z;
  tree[n].a = max(tree[2*n].a+tree[2*n+1].o,tree[2*n].z+tree[2*n+1].a);
 }
}

rsq query(int n, int b, int e, int i, int j)
{
 if(i<=b && j>=e)
  return tree[n];
 int mid=(b+e)/2;
 if(j<=mid)
  return query(2*n,b,mid,i,j);
 if(i>mid)
  return query(2*n+1,mid+1,e,i,j);

 rsq x,y,ans;
 x = query(2*n,b,mid,i,j);
 y = query(2*n+1,mid+1,e,i,j);
 ans.o = x.o + y.o;
 ans.z = x.z + y.z;
 ans.a = max( x.a+y.o, x.z+y.a );

 return ans;
}

int main()
{
 int n,q,t,x,y;
 t=GI;
 while(t--)
 {
  n=GI;
  REP(i,0,n+1) tree[i].a=tree[i].o=tree[i].z=0;
  cin>>s;
  s=" "+s;
  init(1,1,n);
  q=GI;
  REP(i,0,q)
  {
   x=GI; y=GI;
   printf("%d\n",query(1,1,n,x,y).a);
  }
 }
 return 0;
}
Feel free to leave any comments, suggestions, alternative solutions in the comments below.