Data Scientist

Location: Mumbai
Job Type: Permanent
Reference: 04_010819
Sector: Technology

Responsibilities: 

  • To develop and apply bleeding edge machine learning algorithms and statistical pattern recognition on extremely large text corpora in the capital markets domain.  
  • Utilize statistical natural language processing to mine unstructured data, and create insights; analyze and model structured data using advanced statistical methods and implement algorithms and software needed to perform analyses  
  • Build document clustering, topic analysis, text classification, named entity recognition, sentiment analysis, and part-of-speech tagging methods for unstructured and semi-structured data  
  • Cluster and analyze large amounts of user generated content and process data in large-scale environments using Amazon EC2, Storm, Hadoop and Spark  
  • Develop and perform text classification using methods such as logistic regression, decision trees, support vector machines and maximum entropy classifiers  
  • Perform text mining, generate and test working hypotheses, prepare and analyze historical data and identify patterns  
  • Generate creative solutions (patents) and publish research results in top conferences (papers) 
  • Technology Stack for the Resultant Application 
  • Data Storage + Analytics: AWS/ Cloudera (on-Premise) Hadoop Ecosystem with MongoDB & Elastic Search on S3 or on Premise  

 Requirements:

  • Advanced degree from an accredited college/university in Computer Science, Computational Linguistics, Applied Math or Statistics, Engineering, Bioinformatics, Physics, O.R., or related (strong math/stats background with an ability to understand algorithms and methods from both mathematical and intuitive viewpoints)  
  • In-depth knowledge of various NLP domains such as entity extraction, speech recognition, topic modeling, machine translation, natural language understanding, parsing, question answering, etc  
  • Expertise in text mining (probabilistic topic model, word association mining, ontology learning, opinion mining and sentiment analysis, semantic similarity, etc.)  
  • Expertise in natural language processing/understanding (word representation, sentiment analysis, relation extraction, natural language inference, semantic parsing, etc.)  
  • Excellent background in machine learning (generative model, discriminative model, neural network, regression, classification, clustering, etc.)  
  • Experience in deep learning on NLP/NLU is a big plus