Profile Picture

Harsh Joshi

Senior Data Engineer

Senior Data Analyst

Business Intelligence Analyst

As a seasoned professional with extensive experience across the roles of Senior Data Engineer, Senior Data Analyst, and Business Intelligence Analyst, I bring a holistic approach to managing and leveraging data. My expertise spans advanced statistical analysis, data science, and engineering, as well as web crawling, all of which empower me to design cutting-edge data collection models that drive meaningful, actionable insights. What sets me apart is my ability to thrive in diverse, fast-paced environments, where I combine technical prowess with strategic thinking to solve complex challenges. I excel in automation, optimizing data pipelines, and creating scalable systems that deliver tangible business value. My results-oriented mindset ensures that every project I undertake is aligned with the goals of efficiency and accuracy. Beyond technical skills, I am a highly effective communicator, able to translate intricate data insights into clear, impactful business strategies. I am deeply passionate about programming languages, continuously refining my craft to stay at the forefront of technological advancements.

About Me

I am a highly seasoned Senior Data Engineer with over 5 years of experience in a diverse range of technologies including Python, SQL, MongoDB, Spark, Hadoop, Machine Learning, Data Modeling, and ETL implementation. My track record showcases a deep expertise in designing and implementing scalable data pipelines, crafting machine learning models, and translating business requirements into effective data architectures.

Core Competencies

  • Data Engineering: Proficient in Python and SQL, with extensive hands-on experience in big data technologies such as Hadoop, Spark, and MongoDB.
  • Machine Learning: Adept at developing predictive models to aid organizations in making informed, data-driven decisions.
  • ETL Implementation: Skilled in creating efficient ETL pipelines and leveraging technologies like Azure Data Factory and AWS Glue.
  • Data Architecture: Capable of designing scalable databases and optimizing data pipelines for enhanced performance.
  • Data Analysis: Experienced in analyzing complex datasets to extract actionable insights and drive business decisions.
  • Power BI: Proficient in creating interactive and insightful dashboards using Power BI.
  • Data Visualization: Skilled in visualizing data effectively to communicate insights and trends clearly.
  • Data Governance & Security: Knowledgeable in data governance practices, data privacy, and security to ensure compliance and protection of data.

Key Strengths

  • Bridging Business and Technical Requirements: Ensuring effective data solutions that align with organizational goals.
  • Project Management: Expertise in organizing, planning, and executing data-related projects effectively, ensuring timely delivery.
  • Continuous Learning: Dedicated to staying updated with the latest advancements in data technologies.
  • Adaptability: Quick to grasp new concepts and technologies, adapting seamlessly to evolving project demands.
  • Effective Communication: Adept at active listening and clear communication, fostering collaborative problem-solving environments.
  • Independent and Teamwork: Proven ability to thrive both independently and within collaborative team settings.
  • Attention to Detail: Known for maintaining focus and diligence in completing tasks accurately and promptly.
  • Innovative Problem-Solving: Skilled at tackling complex data challenges with creative and efficient solutions.
  • Data Governance Expertise: Ensuring data security, privacy, and compliance in all data management processes.
  • Advanced Data Visualization: Crafting compelling visualizations that effectively communicate insights and support decision-making.

Professional Proficiencies

  • Over 5 years of expertise in Python, PySpark, and Java for data tasks.
  • Experience in designing, implementing, and optimizing SQL and NoSQL databases, including PostgreSQL.
  • Skilled in real-time processing and data warehousing, using Azure Data Factory and AWS Glue.
  • Proficient in Azure Data Bricks, Delta Live Tables, AWS S3, EC2, Lambda, and RedShift.
  • Well-versed in CI/CD pipelines, GitHub, JIRA, and DevOps methodologies.
  • Knowledgeable in TimeSeries Analysis, Data Mining, and Data Detective.
  • Expertise in data governance, integration, and privacy practices.
  • Adept at translating business requirements into data models and quantitative queries.
  • Strong background in Git for collaboration and code management.
  • Experienced in functional testing methodologies to ensure the reliability and quality of data solutions.

Technologies

Python
PySpark
SQL
Azure
Data Bricks
Delta Live Tables
Data Factory
PostgreSQL
NoSQL
MongoDB
Power BI
Data Engineering
Data Analysis
Extract Transform Load (ETL)
Apache
Model Building
Data Server
Data Cleaning
Cloud Computing
Linux
AWS
Data Detective
EC2
DynamoDB
Lambda Functions
Data Security
Data Privacy
Ubuntu
HTML
CSS
Continuous Ingestion
JIRA
GIT
Agile
Scrum
Java

In my journey as an Data Engineer and Analyst, I have learned that true proficiency goes beyond mere functionality. While I have honed the skills to craft code that meets immediate objectives, I am continuously striving to achieve excellence by developing maintainable solutions that support a business's long-term goals. I am committed to navigating complex challenges with agility and simplifying solutions to ensure clarity and ease of maintenance for seamless future operations. I am eager to continue learning and growing in my career, embracing every opportunity to enhance my skills and contribute effectively.

Python

Data Engineering ETL Pipeline

Amazon Web Services

Database (Cloud, NoSQL, MySQL, MongoDB)

Data Analysis / Visualization - Power BI

Data Governance / Data Integration / Data Privacy

Automation / Web Crawling / Web Scraping

Continuous Integration - CI/CD, GitHub, DevOps

Microsoft Azure / DataBricks / Delta Live Tables

Project Management - Capacity Planning

Academic Background

University of Missouri

Master's in Data Science and Data Analytics with High Performance Computing

Aug 2021 - May 2023

Key Areas of Study:

  • Big Data Frameworks: Utilizing state-of-the-art big data frameworks like Apache Hadoop, Apache Spark, and Apache Flink.
  • Cutting-Edge Database: Implementing advanced database technologies such as graph databases, time-series databases, and distributed databases.
  • Machine Learning and AI: Applying techniques including deep learning, natural language processing, and computer vision.
  • Data Streaming and Real-Time Analytics: Leveraging platforms like Apache Kafka and Apache Pulsar for real-time data processing.
  • Containerization and Orchestration: Implementing Docker and Kubernetes for streamlined deployment and scalability.
  • Data Visualization Tools and Libraries: Exploring tools such as Tableau, Plotly, and D3.js for interactive data representations.
  • Professional Ethics: Identifying social, legal, and ethical issues in data science and applying professional standards.

Rajiv Gandhi Proudyogiki Vishwavidyalaya

Bachelor of Engineering in Computer Science

July 2014 - December 2018

Key Areas of Study:

  • Application of Computing and Mathematical Knowledge: Proficiency in algorithms, data structures, and mathematical modeling.
  • Problem Identification and Solution Development: Formulating problem statements and developing effective solutions.
  • System Design and Evaluation: Designing and evaluating computational systems.
  • Team Collaboration: Working collaboratively in multidisciplinary teams.
  • Professional and Ethical Awareness: Adhering to professional, ethical, legal, and security principles.
  • Effective Communication: Communicating complex concepts clearly and persuasively.
  • Societal Impact Analysis: Analyzing the broader societal implications of computing advancements.

Work Experience

Children's Mercy Kansas City - Data Engineer

Kansas City, Missouri, United States

May 2023 – Present

Key Responsibilities :

  • Fine-tuned data pipelines for increased efficiency.
  • Crafted complex scripts in Python for data processing and analysis.
  • Generated comprehensive analysis reports using Microsoft Office.
  • Orchestrated seamless data workflows using Azure Data Factory.
  • Conducted rigorous data-driven tests using SQL to ensure data integrity.
  • Executed projects and tasks seamlessly with strong time management skills.
  • Led optimization of ETL processes with Apache Spark and Azure Databricks.
  • Managed streaming and batch data through Delta Live Tables in Azure Databricks.
  • Embraced DevOps practices and proficiently managed pipelines to foster efficient development cycles.
  • Developed interactive data visualizations using Power BI to support business insights and decision-making.

Skills Acquired:

  • Data Engineering
  • Microsoft Power BI
  • Extract, Transform, Load (ETL)
  • Data Wrangling
  • SQL
  • Azure Databricks
  • Microsoft Office
  • Azure Data Factory
  • Apache Spark
  • Python (Programming Language)
  • Time Management
  • Delta Live Tables

Graduate Research Assistant - Data Engineer

Columbia, Missouri, United States

April 2022 – May 2023

Key Responsibilities :

  • Developed and implemented various Machine Learning Components, including Model Metrics Evaluation, Data Analyzer, Model Executor, and Model Ensemble.
  • Constructed Docker images for seamless deployment, showcasing proficiency in orchestrating robust and scalable solutions.
  • Demonstrated expertise in Entity Extraction from both Structured and Unstructured Data, efficiently processing complex datasets to extract relevant information.
  • Leveraged Knowledge Graphs to develop predictive models for task assignments, enhancing decision-making by predicting optimal teams of individuals for specific tasks.
  • Created insightful analysis reports and maintained effective communication with lab researchers, fostering collaboration and knowledge sharing within multidisciplinary teams.

Skills Acquired:

  • Json Data Extract
  • Knowledge Graph
  • Time Management
  • Data Wrangling
  • Streamlining Data Collection Processes
  • Collaboration and Teamwork
  • Database Management (SQL and No-SQL)
  • Adaptability
  • Active Listening
  • Report Writing
  • Publishing Papers
  • DevOps

The Housing Collective - Data Analyst Intern

United States

June 2022 – Aug 2023

Key Responsibilities :

  • Conducted comprehensive analyses of products, services, markets, and advertising opportunities using data on customers, competitors, and marketing channels.
  • Utilized Crystal Reports to summarize and analyze data on homeless individuals collected by The Housing Collective across the United States.
  • Translated complex data sets into impactful presentations and actionable intelligence, supporting informed decision-making.
  • Created interactive and visually compelling dashboards using Power BI to effectively represent data and facilitate insights.
  • Implemented data management strategies, including removing unused data and filtering, resulting in a 50% improvement in database efficiency.
  • Designed and implemented efficient data flow processes, achieving a 60% increase in streaming efficiency for real-time data analysis.
  • Applied statistical techniques and data mining algorithms to identify trends and patterns, supporting data-driven strategies.
  • Collaborated with cross-functional teams to extract and communicate relevant insights from data, enhancing decision-making processes.

Skills Acquired:

  • Statistical Analysis
  • Data Mining
  • Trend Analysis
  • Process Improvement
  • Cross-Functional Collaboration
  • Strategic Planning
  • Database Efficiency Improvement
  • Data Cleaning and Preprocessing
  • Active Listening
  • Feature Engineering
  • A/B Testing

Affimintus Technologies - Data Engineer

Indore, MP, India

Jan 2020 – July 2021

Key Responsibilities :

  • Provided leadership and engaged with the team within agile methodologies, offering and receiving feedback on design to ensure alignment with business requirements and adherence to industry standards.
  • Designed and maintained high-performance ELT/ETL processes, incorporating advanced predictive analytics techniques such as machine learning and data mining, achieving a 95% accuracy rate in output forecasting.
  • Mentored junior team members, ensuring adherence to coding standards and fostering a collaborative work environment.
  • Automated data processing tasks using Python and libraries like Pandas and Beautiful Soup, significantly enhancing efficiency.
  • Led teams within agile methodologies to effectively align applications with business needs and manage critical ingestion pipelines.
  • As a Data Architect, designed and maintained high-performance ELT/ETL processes, integrating predictive analytics techniques for accurate forecasting.
  • Generated comprehensive management data reports and presented key insights to executive stakeholders.
  • Implemented strategic initiatives for redesigning and developing critical ingestion pipelines, facilitating the smooth processing of large volumes of data and showcasing innovative solutions to complex data challenges.

Skills Acquired:

  • Data Analyzer Implementation
  • Agile Methodologies
  • CI/CD Practices
  • Python Scripting
  • 80% Improvement in Data Flow and Processing Speed
  • Extraction and Communication of Relevant Insights
  • Real-time ETL Processes with Azure Logic Apps and AWS
  • Automation of Manual Data Reconciliation Process
  • Mentorship to Junior Team Members
  • Continuous Execution (EC2 Servers)
  • Efficient Data Storage Practices

6DegreesIT - Data Engineer

Indore, MP, India

May 2018 – Dec 2019

Key Responsibilities :

  • Spearheaded the establishment of high-availability data infrastructure, resulting in a 90% increase in data availability and ensuring uninterrupted access to critical business data.
  • Developed a robust data monitoring and alerting system to proactively identify and resolve production issues, mitigating potential impacts on business operations.
  • Implemented a reproducible data pipeline, reducing delivery time by 60% and streamlining insights delivery to stakeholders.
  • Designed and enforced best practices for continuous process automation in data ingestion and pipeline workflows, enhancing operational efficiency.
  • Led the design and development of logical and physical Data Models and authored PL/SQL code for data conversion in the Clearance Strategy Project, showcasing expertise in data architecture and conversion methodologies.
  • Collaborated with Development and QA teams to institute a build schedule and address data quality issues, utilizing Regular Expression codes for data parsing.
  • Managed the entire software development life cycle, from design and documentation to implementation, testing, and deployment, including designing table structures and reporting formats for global reports.
  • Enhanced visualization for customers and development team contractors through well-designed reporting formats.
  • Orchestrated comprehensive testing cycles, including Functional Testing, System Integration Testing, Database Testing, Regression Testing, and User Acceptance Testing.

Skills Acquired:

  • Version Control Systems (e.g., Git)
  • Data Manipulation and Analysis Libraries in Python (e.g.,Pandas, NumPy)
  • Cloud Computing Platforms (e.g., AWS, Azure, Google Cloud Platform)
  • Continuous Learning and Adaptability to New Technologies
  • Problem-Solving and Troubleshooting Skills
  • SQL Database Management and Querying
  • Data Serialization (JSON, XML)
  • Web Scraping (BeautifulSoup, Scrapy)
  • Java
  • Automation Tools: Selenium, TestnG

Thank you for visiting my portfolio! I’m thrilled to share my accomplishments and skills with you. As you explore, I hope you find the content both informative and engaging. If you have any questions or need more information, please feel free to reach out. I'm dedicated to providing you with the assistance you need to ensure your experience here is enjoyable and insightful.

Thank you again for your interest, and I look forward to any future interactions we may have.