Mastech InfoTrellis
Data Scientist - Optical Character Recognition
Job Location
bangalore, India
Job Description
Job Description : Data Scientist Company : Mastech Digital Location : Bangalore Urban, Karnataka, India Position Type : Full Time Duration : Permanent Notice Period : Immediate Joiner / Serving Notice / Less than 30 Days Experience : 5 Years About the Role : Mastech Digital is seeking a highly skilled and experienced Data Scientist to join our dynamic team. In this role, you will be responsible for developing and deploying advanced AI models, with a focus on OCR, LLMs, and computer vision. You will work within the AWS ecosystem, adhering to best practices for code quality, data security, and model deployment. This position requires a strong understanding of machine learning techniques, cloud technologies, and the ability to collaborate effectively with cross-functional teams. Responsibilities/Duties : AI Model Development and Deployment : - Train and fine-tune AI models using OCR and Large Language Models (LLMs). - Develop and implement computer vision models for object detection and segmentation. - Deploy and maintain models in production, collaborating with software engineers. Cloud Infrastructure and Architecture : - Utilize AWS services, including SageMaker, Bedrock, Lambda, S3, and API Gateway, for model development and deployment. - Adhere to the AWS Well-Architected Framework for robust and scalable solutions. Data Management and Security : - Perform data cleaning and preprocessing to ensure high-quality training data. - Ensure data confidentiality and implement HIPAA compliance measures. Software Development Practices : - Follow internal best practices for code monitoring, testing, and version control. - Implement CI/CD pipelines using Jenkins and other relevant tools. - Conduct thorough QA and application testing. Model Evaluation and Optimization : - Perform robust testing of models to ensure accuracy and reliability. - Compare the feasibility of different models and select the most appropriate solution. - Fine-tune LLMs (Mistral, Llama, and other open-source models) and perform prompt tuning. Collaboration and Communication : - Collaborate with other data scientists to divide work and ensure timely project completion. - Meet deadlines for weekly/bi-weekly meetings and provide regular updates. - Create data visualizations to communicate results to non-technical stakeholders. - Testing and implementing NER models. Huggingface and Related Technologies : - Familiarity with huggingface packages. Skills : Programming and Data Science : - Proficient in Python. - Strong SQL skills. - Experience with data cleaning and big data processing. - Experience with OCR and NER models. Cloud Technologies (AWS) : - Extensive experience with AWS SageMaker, Bedrock, Lambda, S3, and API Gateway. - Proficiency in using Textract API. Machine Learning and AI : - Experience with training and fine-tuning LLMs (Mistral, Llama, etc.). - Proficiency in prompt tuning. - Experience with computer vision models for object detection and segmentation. DevOps and CI/CD : - Experience with CI/CD pipelines and version control systems. - Proficiency in using Jenkins. Huggingface : - Familiarity with huggingface packages. Qualifications : - 5 years of experience as a Data Scientist. - Bachelor's or Master's degree in Computer Science, Data Science, or a related field. - Strong understanding of machine learning algorithms and techniques. - Excellent problem-solving and analytical skills. - Strong communication and collaboration skills. - Ability to work independently and as part of a team. Preferred Qualifications : - Experience with healthcare data and HIPAA compliance. - AWS certifications. - Experience with advanced computer vision techniques. (ref:hirist.tech)
Location: bangalore, IN
Posted Date: 5/7/2025
Location: bangalore, IN
Posted Date: 5/7/2025
Contact Information
Contact | Human Resources Mastech InfoTrellis |
---|