Technical Knowledge Search Using Data Science
Data Science

Technical Knowledge Search Using Data Science

Vinaya: We have Mohan with us to talk about this
Hi, Mohan! Why did the client come to Intuceo?

Mohan: This specific client is in the Vision care R&D division of a large conglomerate. They had the research done over several decades. They had these documents help them in various researches all over their IT infrastructure. There were several system involved – disparate systems, old and new, in different locations. They were all connected but they were not sure which data is where. They only knew where the depository of data was. One of the challenges was how to search for the relevant document for the topic they were researching for. So, instead of reinventing the wheel, use of the existing information was the objective. But they were unable to do because of the way the documents were stored and available in various systems. Our solution what we mentioned to them is more related to documents can be wherever they are but we suggested to them that we will make sure that we will build a database of that for the metadata and then provide the search. That is the challenge we wanted to solve in our analytics way.

Vinaya: How did you implement?

Mohan: First we looked at the location of the data stored. Where it is stored? How many types of data? It could be word, pdf, email. There were so many different locations where the data was actually stored. We understood where all data repositories lie today. And typically what a researcher searches. What data did they like to retrieve? What problems they are actually facing in terms of the type of information they are getting versus the type of information they were looking for? After that we looked at what solution or what part of the solution we already had in our system. Then we looked at the roadmap saying that how do we look at indexing the existing documents as well as fitting our search engine on top of it. Then it was the normal course of lifecycle of design, development and testing. That was the process we followed.

Vinaya: How did you manage the analytics solution?

Mohan: They had about 100,000 documents approximately. They were of different types. We started building the connectors to reach those locations or those types of documents. And then we looked at sampling. How do we sample each type of data, I mean, each type of document. And then, build the metaindex of what we wanted to build. We took about 10,000 documents as a sample constituting all different types of documents. We built the model over that. How do we get the relevant document for their search rather than throwing all documents that will meet their search criteria. We used our machine learning algorithm, some of the algorithms, what we already had and then we developed some. And then we brought the output in such a way that we filter the output whatever we receive to give them the most relevant top 10 documents for the search terms they were looking for.

Vinaya: Did the clients enjoy success?

Mohan: First and foremost, they didn’t have to go to various repositories to search. Second, they could save the effort in searching. The search time itself reduced from several minutes to a few seconds. Thirdly, when they got the top 10, probably they didn’t have to go to any other documents to search. Most of the information they were getting were in the top 10 search output list what we could provide.

Vinaya: Thank you very much, Mohan. Thank you all for watching.