- Build pipelines to ingest a wide variety of data from multiple sources within the organization and from external sources (e.g., government, vendors, and social media).
- Optimize existing pipelines .
- Testing data structures to ensure that they are fit for use for data analysts and scientists.
- Prepare and maintain environments for secure prototyping, development, testing and data manipulation for data scientists.
- Design and implement effective data storage solutions and models.
- Assist data models and AI/ML deployment.
- Assess database implementation procedures to ensure they comply with internal and external regulations.
- Prepare accurate database design and architecture reports for management and executive teams.
- Oversee the migration of data from legacy systems to new solutions on Cloud infrastructure.
- Monitor the system performance by performing regular tests, troubleshooting, and integrating new features.
- Automate low value tasks
Technical skills and work experience
- Experience with at least one major Cloud Infrastructure provider (Azure/AWS/GCP)
- Experience building data pipelines using batch processing with Apache Spark (Spark SQL, DataSet / Dataframe API) or Hive query language (HQL)
- Knowledge of Big Data ETL processing tools
- Experience in Data Modelling, Data mapping for Data Warehouse and Data Marts solutions
- Experience with Hive and Hadoop file formats (Avro / Parquet / ORC)
- Basic knowledge of scripting (shell / bash)
- Experience of working with multiple data sources including relational databases (SQL Server / Oracle / DB2 / Netezza), NoSQL / document databases, flat files
- Understanding of CICD tools such as Jenkins, JIRA, Bitbucket, Artifactory, Bamboo and Azure Dev-ops.
- DevOps practices using Git version control
- An interest in staying up to date with industry standards and technological advancements that will improve the quality of your outputs.
- Ability to debug, fine tune and optimize large scale data processing jobs.
- Highly capable in:
- Python or Scala
- SQL
- DataBricks
- Knowledge in:
- Azure Data Factory
- Azure DevOps
- BitBucket or GitHub
- Machine learning
- MLFlow
- ML (Machine Learning) frameworks (e.g., scikit-learn, TFX, PyTorch etc.).
- Knowledge of life insurance industry preferred.
- Flexibility, creativity, and the capacity to receive and utilize constructive feedback.
- Curiosity and outstanding interpersonal skills.
- Work as a team player
- Capacity to successfully manage a pipeline of duties with minimal supervision.
- Master’s degree or equivalent work experience in Computer Science or related discipline.
- Certificates in Data Engineering from reputable MOOCs or cloud vendors are welcome
Language: Fluent written and spoken English