Question : Given an array containing only 0's and 1's, find the maximum contiguous subarray containing equal no of 0's and 1's.
Complexity : O(n).
Source : Asked to me by Rajesh Kamineni
My Solution :
Call the original array 'A'. Let it's size be 'n'.
Create 2 arrays 'B' and 'C'of size '2n + 1' each. Initialize each to -1 everywhere.
Then, do the following [2]:
Here's what the above procedure does :
Let us denote, (No. of 1s in A before index i) - (No. of 0s in A before index i) by #(i).
Note that when we enter the i-th iteration, diff = (No. of 1s in A before index i) - (No. of 0s in A before index i) = #i
Also note that the value of diff can range from '-n' to 'n'[1].
As a result, we have
B[j] = the smallest index 'i' in the array A such that, #(i) = j - n,
C[j] = the largest index 'i' in the array A such that, #(i) = j - n,
i.e. #(B[j]) = #(C[j])
We need the largest sub-array of A with equal number of 1s and 0s.
Suppose the required sub-array is from index 'a' to index 'b' of A (with elements at index 'a' and index 'b' included).
Then, clearly,
#(b+1) = #(a) (Do you see why?)
And our goal is to maximize (b+1) - (a).
Therefore, we simply have to find the index 'j' such that C[j] - B[j] is maximum. This can done in O(n), in a single pass.
Once the 'j' is found, the required sub-array is from index 'B[j]' to index 'C[j] - 1' of A.
Notes :
Complexity : O(n).
Source : Asked to me by Rajesh Kamineni
My Solution :
Call the original array 'A'. Let it's size be 'n'.
Create 2 arrays 'B' and 'C'of size '2n + 1' each. Initialize each to -1 everywhere.
Then, do the following [2]:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
int diff = 0; | |
for (int i = 0; i <= n; i++){ | |
if (B[diff + n] == -1) B[diff + n] = i; | |
else B[diff+ n] = min(B[diff + n], i); | |
if (C[diff + n] == -1) C[diff + n] = i; | |
else C[diff + n] = max(C[diff + n], i); | |
if (i < n && A[i] == 0) diff--; | |
else diff++; | |
} |
Let us denote, (No. of 1s in A before index i) - (No. of 0s in A before index i) by #(i).
Note that when we enter the i-th iteration, diff = (No. of 1s in A before index i) - (No. of 0s in A before index i) = #i
Also note that the value of diff can range from '-n' to 'n'[1].
As a result, we have
B[j] = the smallest index 'i' in the array A such that, #(i) = j - n,
C[j] = the largest index 'i' in the array A such that, #(i) = j - n,
i.e. #(B[j]) = #(C[j])
We need the largest sub-array of A with equal number of 1s and 0s.
Suppose the required sub-array is from index 'a' to index 'b' of A (with elements at index 'a' and index 'b' included).
Then, clearly,
#(b+1) = #(a) (Do you see why?)
And our goal is to maximize (b+1) - (a).
Therefore, we simply have to find the index 'j' such that C[j] - B[j] is maximum. This can done in O(n), in a single pass.
Once the 'j' is found, the required sub-array is from index 'B[j]' to index 'C[j] - 1' of A.
Notes :
[1] - One may further note that diff can take at most n+1 different values, and we don't explicitly require the actual value of diff. So the arrays B and C can be of size 'n + 1' each and the B[diff] can be replaced by B[diff%n].
(Do you see why?)
[2] - Optimization suggested by Vinod :
Note that 'i' increases from '0' to 'n' as the loop is executed. As a result, the value of 'B[diff + n]' needs to be set only once, for the lowest 'i'. Similarly, subsequent values of 'i' are always greater than 'C[diff + n]', so C[diff + n] needs to be set every time.
The check 'i < n' can also be avoided by increasing the size of A to 'n + 1' elements. It does matter what the value of A[n] is, because the final value of diff is never used.
Here is the optimized code :
Note that 'i' increases from '0' to 'n' as the loop is executed. As a result, the value of 'B[diff + n]' needs to be set only once, for the lowest 'i'. Similarly, subsequent values of 'i' are always greater than 'C[diff + n]', so C[diff + n] needs to be set every time.
The check 'i < n' can also be avoided by increasing the size of A to 'n + 1' elements. It does matter what the value of A[n] is, because the final value of diff is never used.
Here is the optimized code :
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
int diff = 0; | |
for (int i = 0; i <= n; i++){ | |
if (B[diff + n] == -1) B[diff + n] = i; | |
else C[diff + n] = i; | |
if (A[i]) diff++; else diff--; | |
} |
Feel free to post alternate solutions, extensions, comments, clarifications, optimizations, and mistakes (if any) as comments.
I suggest a small optimization though it doesn't matter much.The two if else blocks can be merged into one if-else block given below.
ReplyDeleteif(B[diff+n] == -1)B[diff+n] = i;
else C[diff+n] = i;
the reason is i is increasing over 0 to n so in else part of first block i is always greater than B[diff+n] and similar reasoning for else of second block.
You are right. Thanks!
ReplyDelete