How do you find the longest common substring?

How do you find the longest common substring?

From Zulfqar Chachar

Technology and programming languages are constantly evolving. However, what remains the same in every programming language is its basics or fundamental concepts. One such basic concept in every programming language is...

Support this campaign

Subscribe to follow campaign updates!

More Info

Technology and programming languages are constantly evolving. However, what remains the same in every programming language is its basics or fundamental concepts. One such basic concept in every programming language is String. 

A string is defined as a data type that is used to store a sequence of characters. Seems easy, right?

Let’s be honest, the string is a huge concept in itself. You can perform multiple operations on a string which makes it one of the most-asked concepts in any coding interview or exam.

One of the most-talked concepts related to strings is substrings. A substring is simply a sequence of characters present in a string in a continuous pattern. 

Here, we are going to talk about one very common coding problem related to substrings, ie, how to find the longest common substring

But before understanding the problem, let’s discuss in detail what a substring is.

Understanding Substring

As the name suggests, a substring is simply a part of the original string. This substring is a continuous sequence of characters whose length is less than or equal to the length of the original string.

To understand the same, consider the following example:

If the given string is A= “efghij”

The substrings of this given string will be e, f, g, h, i, j, ef, gh, ij, efg, fgh, ghi, hij, efghij and more. 

Understanding The Problem Statement

Here, you will be given two strings. You need to write code to find a common substring between the two strings which has the longest number of elements. 

For instance, you are given the following strings:

String 1= "apple"

and String 2="ape"

The longest common substring between the two given strings is ap. 

Methods To Find The Longest Common Substring

To find the longest common substring of given strings, you can use the following methods:

  • Brute force approach
  • Dynamic programming approach
  • Recursive approach

Brute Force Approach

The most basic approach of finding the longest common substring is to check all the possible substrings present in string 1. You will then have to check if each of the substrings of string1 is a substring of string2 or not. 

If substrings 1 and 2 are the same, you need to update the length of the common substring. To do this, you will have to fix the beginning points in the strings. At the same time, you will have to check the length of the substring generated for every possible pair of the beginning indices. 

Algorithm:

  • To begin with, create a function with the name "substringlength". This function will accept two parameters which are strings 1 and 2. Also, declare and initialise a variable with the name maximumlength. This variable will store the length of the required substring. 
  • You will now have to initialise variables n1 and n2. 
  • Now, you need to run two loops to fix the beginning indices of the given strings. 
  • Under the nested for loop, you will have to run a while loop to determine the length of the possible substring. 
  • You will have to update the maximumlength variable after determining every possible longest substring. When you will be done checking all the possible pairs of beginning indexes, you need to print the latest value of maximumlength. 

Complexity Analysis

Time complexity: The time complexity of this method is calculated as O(N*M* max(N, M). 

Space complexity: The space complexity of this method is calculated as O(1). 

Dynamic Programming Approach

The next method that you can use to find the longest common substring of the given is using dynamic programming. In this process, we follow the idea of finding the longest common suffix for all the substrings possible for both given sequences. 

Algorithm:

  • You need to create a dynamic programming array of size N*M. Here, N and M represent the length of sequences. 
  • After that, you will have to iterate both strings using a nested loop. 
  • Follow the condition mentioned below:

If i==0 and j==0, then dp[i][j]=0

If str1[i]== str2[j], then dp[i][j]= 1+ dp[i-1][j-1]

  • At the same time, you will have to keep the maximum value updated. 
  • In the end, print the maximum value. 

Complexity Analysis

Time Complexity: The time complexity of this method is calculated as O(n*m). 

Space complexity: The space complexity of this method is calculated as O(n*m). 

Recursive Approach

In this approach, the main idea is to match all the characters of given sequences again and again. If characters match, you need to maximise the length. 

Algorithm:

  • You need to first initialise one variable res which will count the longest substring. 
  • Now, consider that i and j are the indexes of string 1 and string 2 which points to the last character of these strings. 
  • In case, the values of these indexes are the same, you need to add 1 to the value of res. 
  • Now, match other characters of the string in the same way and keep on decrementing the values of i and j in case the characters are different. 
  • You need to repeat the whole process until you reach the starting index of both of these strings. 

Complexity Analysis

  • Time complexity: The time complexity of this method is calculated as O(3^(n*m)). 
  • Space complexity: The space complexity of this method is calculated as O(max(n, m)). 

Most people may end up getting confused between a common substring and the longest consecutive subsequence problem

In the longest consecutive subsequence problem, you will be given an array consisting of integer values. You will have to determine the longest subsequence of the array in such a way that the subsequence elements are consecutive in nature. 

Conclusion

When it comes to problems related to strings, the longest common substring problem is commonly asked in many interviews and exams. 

The longest common substring problem can be resolved using three methods. However, not every solution or method is feasible for everyone. 

Therefore, before solving the problem, make sure to go through all the possible methods to resolve the problem and choose the optimal solution. 

Campaign Wall

Join the Conversation

Sign in with your Facebook account or