Fringes: 1's and 0's

Question : Given an array containing only 0's and 1's, find the maximum contiguous subarray containing equal no of 0's and 1's.

Complexity : O(n).

Source : Asked to me by Rajesh Kamineni

My Solution :

Call the original array 'A'. Let it's size be 'n'.
Create 2 arrays 'B' and 'C'of size '2n + 1' each. Initialize each to -1 everywhere.

Then, do the following [2]:

	int diff = 0;

	for (int i = 0; i <= n; i++){
	if (B[diff + n] == -1) B[diff + n] = i;
	else B[diff+ n] = min(B[diff + n], i);

	if (C[diff + n] == -1) C[diff + n] = i;
	else C[diff + n] = max(C[diff + n], i);

	if (i < n && A[i] == 0) diff--;
	else diff++;
	}

view raw snip2.cpp hosted with ❤ by GitHub

Here's what the above procedure does :
Let us denote, (No. of 1s in A before index i) - (No. of 0s in A before index i) by #(i).

Note that when we enter the i-th iteration, diff = (No. of 1s in A before index i) - (No. of 0s in A before index i) = #i
Also note that the value of diff can range from '-n' to 'n'[1].

As a result, we have
B[j] = the smallest index 'i' in the array A such that, #(i) = j - n,
C[j] = the largest index 'i' in the array A such that, #(i) = j - n,

i.e. #(B[j]) = #(C[j])

We need the largest sub-array of A with equal number of 1s and 0s.
Suppose the required sub-array is from index 'a' to index 'b' of A (with elements at index 'a' and index 'b' included).
Then, clearly,
#(b+1) = #(a) (Do you see why?)
And our goal is to maximize (b+1) - (a).

Therefore, we simply have to find the index 'j' such that C[j] - B[j] is maximum. This can done in O(n), in a single pass.
Once the 'j' is found, the required sub-array is from index 'B[j]' to index 'C[j] - 1' of A.

Notes :

[1] - One may further note that diff can take at most n+1 different values, and we don't explicitly require the actual value of diff. So the arrays B and C can be of size 'n + 1' each and the B[diff] can be replaced by B[diff%n].

(Do you see why?)

[2] - Optimization suggested by Vinod :
Note that 'i' increases from '0' to 'n' as the loop is executed. As a result, the value of 'B[diff + n]' needs to be set only once, for the lowest 'i'. Similarly, subsequent values of 'i' are always greater than 'C[diff + n]', so C[diff + n] needs to be set every time.

The check 'i < n' can also be avoided by increasing the size of A to 'n + 1' elements. It does matter what the value of A[n] is, because the final value of diff is never used.

Here is the optimized code :

	int diff = 0;

	for (int i = 0; i <= n; i++){
	if (B[diff + n] == -1) B[diff + n] = i;
	else C[diff + n] = i;
	if (A[i]) diff++; else diff--;
	}

view raw opt_1s_0s.cpp hosted with ❤ by GitHub

Feel free to post alternate solutions, extensions, comments, clarifications, optimizations, and mistakes (if any) as comments.

2 comments:

Vinod Reddy GJune 18, 2012 at 10:37 PM
I suggest a small optimization though it doesn't matter much.The two if else blocks can be merged into one if-else block given below.
if(B[diff+n] == -1)B[diff+n] = i;
else C[diff+n] = i;

the reason is i is increasing over 0 to n so in else part of first block i is always greater than B[diff+n] and similar reasoning for else of second block.
AakashJune 19, 2012 at 2:58 PM
You are right. Thanks!

Monday, June 18, 2012

1's and 0's

2 comments: