Formel D Group
Cientista de dados
Job Location
Sorocaba, Brazil
Job Description
Generate software systems derived from various Machine Learning (ML) techniques to drive innovation; Explore large volumes of data, create and process datasets for training and evaluating machine learning models. Carrying out data analysis and statistical processing, data sanitisation and building features appropriate to the problem domain (feature engineering). Explore, train, adjust and evaluate machine learning models based on classical techniques (XGBoost, genetic learning, K-means, PCA) and state-of-the-art techniques (e.g. neural networks, generative adversarial networks, reinforcement learning, foundation models). Identify opportunities for supervised, unsupervised and semi-supervised learning according to the task. Master data augmentation and regularisation techniques for images, time series, tabular data and natural language processing. Develop and deploy machine learning-based solutions in production. Collaborate with other development teams to implement trained models in existing systems, participating in the definition of the best architectures and data pipelines. Writing, developing, evaluating, documenting and deploying new and modified machine learning models using the most advanced technologies, services, frameworks and libraries on the market; Carrying out manual tests and preparing trained models for installation on different infrastructures, e.g. cloud and embedded devices. Prepare and present technical reports and recommendations on ongoing projects, including continuous exploration of state-of-the-art technologies in the field of artificial intelligence and machine learning; Knowledge of computer vision, federated learning, LLMs, use of GPUs, etc. is desired; Work with agile methodology and in multidisciplinary teams focused on delivering value to the business and end customer using Scrum or similar approaches. Requirements - Bachelor's degree - Advanced English - Build and train ML and AI models; - Manipulation of complex and heterogeneous data: cleaning and sanitizing, deduplication, and dataset construction; - Programming languages used in AI: Python, Scala, R (desirable) and PySpark - AI libraries: TensorFlow, Caffe, Keras, Torch, etc. - Use of knowledge in algorithms, statistics, regression models, decision trees, neural networks, etc. - Experience with time series, image processing, and other complex data types. - Data Lake (Hadoop ecosystem, Spark, Kafka, etc.) - Microsoft Azure Cloud - Integrated, automation and white-box tests
Location: Sorocaba, São Paulo, BR
Posted Date: 8/19/2025
Location: Sorocaba, São Paulo, BR
Posted Date: 8/19/2025
Contact Information
Contact | Human Resources Formel D Group |
---|