Substring With Concatenation Of All Words
You are given a string s, and an array of strings words. All the strings of words are of the same length
A concatenated substring in s is a substring that contains all the strings of any permutation of words concatenated. Essentially, if all the words in words appear exactly once consecutively, then the substring is a concatenated substring.
Return the starting indices of the concatenated substrings in s. You can return the answer in any order.
Examples
Input:s = "barfoothefoobarman", words = ["foo, bar"] –> [0, 9]
Input: s = "wordgoodgoodgoodbestword", words = ["word", "good", "best", "word"] –> []
Solution 1: Brute Force Hashmap Comparison
Let the size of s be n, let the number of elements in words be m, let the length of a word in words be w. It follows that every concatenated substring will have length m * w. We can brute force a solution as follows:
- load all
wordsinto a hashmaptotal - For each length
m * wsubstring ofs, split into lengthwpieces, add the pieces to a new hashmapprospectand checkprospect == total - Return a vector of all the indices where step 2 is true. Putting these steps into code, we get this solution:
class Solution{
public:
vector<int> findSubstring(string s, vector<string>& words){
vector<int> res;
if(words.size() == 0) return res;
unordered_map<string, int> total;
for(string& word: words){
total[word]++;
}
int m = words.size();
int w = words[0].size();
for(int i = 0; i < s.size() - (m*w) + 1; ++i){
unordered_map<string, int> prospect;
int j = i;
while(j < i + (m*w)){
string word = s.substr(j, w);
prospect[word] += 1;
j+= w;
}
if(prospect == total) res.push_back(i);
}
return res;
}
};
Runtime Analysis:
- Building
total:O(1) * m = O(m) - Building
prospectand comparing withtotal:O(m*w)(There are(m*w)characters in both maps) - Iterating through
s:O(n)Total Runtime:O(m) + O(n * m * w) = O(n * m* w)
Space Analysis:
total:O(m*w)prospect:O(m*w)(ntimes) Space complexity is alsoO(m*n*w)
Both space and time are cubic: can we do better? Looking at this solution, we do a lot of duplicate comparisons. if we create a map prospect and check all the internals against total, then discard it, we waste a lot of comparisons that we could have saved for future iterations…
Solution 2: Sliding Window Hashmap Comparison
We can use a sliding window approach to avoid duplicate comparisons. As before, let the size of s be n, let the number of elements in words be m, let the length of a word in words be w. Because hashmaps can only be recycled when the next hashmap begins a multiple of w away, we will need an outer loop of size n. Our solution has three steps:
- Create a hashmap (
total) of all strings in words, with higher values corresponding to more occurrences - Loop through all possible remainders (mod
w). In each iteration:- create a new hashmap
prospect, and a new stringt - Create a loop of length
nto loop through the characters ins. In each iteration:- increment
tuntil it is lengthw, then addttoprospect - when your window size has reached the size of a concatenated string, check whether
prospect==total. If it does, save the start of the window.
- increment
- create a new hashmap
- return the list of valid starting indices
Coding this up:
class Solution{
public:
vector<int> findSubstring(string s, vector<string>& words){
int n = s.size();
int m = words.size(), w = words[0].size();
unordered_map<string, int> total;
for(string& word : words){
total[word]++;
}
vector<int> res;
for(int i = 0; i < w; i++){
unordered_map<string, int> prospect;
string t = "";
for(int j = i, k = i; k < n; k++){
t += s[k];
if(t.size() == w){
prospect[t]++;
t = "";
}
if(k-j+1 == m*w){
if(prospect == total)res.push_back(j);
string remove = s.substr(j, w);
if(prospect[remove] > 1) prospect[remove]--;
else prospect.erase(remove);
j+=w;
}
}
}
return res;
}
};