Data & AI Enthusiast

Data Scientist & Machine Learning Engineer

I’m an Applied AI and Data Systems professional focused on turning complex data into robust, production ready intelligence across machine learning, MLOps, and analytics engineering. I’ve led projects in banking, airlines, healthcare industries designing forecasting models, fraud and risk solutions, and BI experiences that directly influence decisions at scale. I enjoy owning problems end to end from SQL pipelines, cloud based ML, and model monitoring to dashboards and storytelling that make insights clear, trustworthy, and actionable. My work sits at the intersection of AI engineering, data platforms, and user-centric design, and I’m always looking to build systems that are not just accurate, but reliable, interpretable, and aligned with real world impact.

Rakesh Sarma Karra
What I Do

Machine Learning & Intelligent Analytics

  • Design and deploy supervised, unsupervised, and time series models to solve real business problems across banking, airlines, healthcare, and nonprofit domains.
  • Focus on robust feature engineering, model evaluation, and interpretability techniques (e.g., SHAP, KPI analysis) to make model behavior transparent and actionable.
  • Translate model outputs into clear recommendations that help stakeholders make confident, data-driven decisions.

Data Engineering & Scalable Pipelines

  • Build ETL and ELT pipelines using SQL, Apache Spark, PySpark, Airflow, and dbt to move data reliably from source systems into analytics‑ready models.
  • Engineer data workflows that emphasize data quality, validation, and reproducibility across development and production environments.
  • Optimize performance and scalability so models, reports, and dashboards continue to work as data volume and complexity grow.

Business Intelligence & Decision Dashboards

  • Develop interactive dashboards in Power BI, Tableau, and Looker Studio that turn complex data into clear stories for operations, risk, and leadership teams.
  • Design drill‑down views, filters, and KPIs that support day‑to‑day monitoring and deeper “why” analysis across credit risk, marketing, and operations.
  • Align visualizations with how stakeholders think about the business, making insights intuitive, trustworthy, and easy to act on.

MLOps, Cloud & Responsible AI

  • Run ML and analytics workloads on Azure, AWS, and GCP, from experimentation through deployment, using Git‑based workflows and CI/CD practices.
  • Integrate monitoring, logging, and version control to make models easier to maintain, iterate on, and roll back when needed.
  • Incorporate data governance, regulatory reporting, and bias/fairness considerations so systems are accurate, compliant, and auditable.
Experience

Data Scientist – Community Dreams

April 2025 – Present United States
  • Developed and evaluated machine learning models in Python to analyze community program data (e.g., participation, outcomes, engagement), generating predictive insights that supported data driven decision making for nonprofit initiatives.
  • Built end to end analytical workflows using Python, SQL, and Excel to clean raw community datasets, engineer features, train baseline models, and visualize key findings for stakeholders, documenting the work in GitHub for version control and collaboration.

Machine Learning Engineer(Volunteer) – Murphy Charitable Foundations

Nov 2025 – Present United States
  • Contributing as an ML Engineer volunteer to design and prototype machine learning models in Python for Murphy Charitable Foundation’s new application supporting vulnerable communities (e.g., child sponsorship and donor engagement use cases).
  • Building and iterating on data pipelines using Python and SQL (data cleaning, feature engineering, basic model training), with experiments and notebooks \ version controlled through Git and regularly pushed to GitHub.

Teaching Assistant (Paid) & Research Data Scientist (Volunteer) – University of North Texas

January 2024 – December 2024 Denton, United States
  • Designed and delivered Python-based lab sessions covering advanced machine learning and big data analytics.
  • Conducted AI bias research, identifying and analyzing system, developer, and statistical biases in ML models.
  • Designed an AI reliability survey, revealing 68% of participants believed AI enhanced performance, providing insights into user trust and adoption.
  • Performed a comparative analysis of bias, hate speech detection, and sentiment classification across AI models from ChatGPT, Gemini, Meta AI, and Claude AI.

Data Scientist Intern – American Airlines

September 2023 – December 2023 United States - Remote
  • Improved flight demand forecasting accuracy 78% using Random Forest with advanced feature engineering and time-series trend analysis.
  • Automated Python/SQL data pipelines, reducing processing time 60% and uncovering operational behavior patterns.
  • Built a congestion analysis framework using statistical metrics to support infrastructure capacity planning.

Data Scientist – Citi Bank

November 2021 – January 2023 India
  • Developed an automated feature engineering and fraud detection pipeline using Featuretools and XGBoost, improving model accuracy for high risk transaction detection and data imputation by 15%.
  • Designed and orchestrated ETL workflows in Apache Airflow to migrate data from CMR to MDM, increasing data stewardship reliability to 76% and stabilizing downstream analytics.
  • Streamlined ingestion of large unstructured zip files into SAS by combining UNIX utilities (unzip, grep, awk) with SAS scripts, reducing data loading and preprocessing time by 40% compared to legacy workflows.
  • Enhanced data quality audits and compliance analytics using advanced SQL (CTE, joins, subqueries, case when) under CCPA 2020, accelerating validation and regulatory reporting.
  • Collaborated with engineering and product development teams to automate profiling and compliance reports using Python (Pandas) and Autosys/Bitbucket, reducing manual validation by ~30% and strengthening data integrity monitoring.

Data Analyst & Analytics Reporting – ICICI Bank

March 2019 – October 2021 India
  • Built and validated predictive risk models in SAS Enterprise Miner and Python (Scikit-learn) for loan default prediction across products such as personal loans, gold loans, and fixed deposits, improving underwriting accuracy by 12%.
  • Implemented K-Means clustering in SAS and Python on 4M+ customer records to segment portfolios by behavior and holdings, enabling targeted cross-sell campaigns that increased conversion by 22%.
  • Automated portfolio performance reporting using PROC REPORT, dynamic SAS macros, and PROC SQL, cutting manual reporting time by 60% and providing near real-time visibility into loans, transactions, and customer behavior.
  • Optimized data pipelines with PROC SORT and macro-driven workflows, reducing query latency by 35% for fraud and compliance reports across 20+ banking products.
  • Prototyped VB portfolio dashboards and delivered executive-ready views for BI, Risk, and senior leadership stakeholders.
Projects
Predicting Unengaged Medicare Advantage Members

Humana - Texas A&M Healthcare Analytics Case Competition

Built an XGBoost engagement model (ROC-AUC 0.76) with SHAP insights and KPIs to guide targeted outreach and quantify business impact for Medicare Advantage members.

Python Google Cloud Platform SQL Machine Learning
View on GitHub →
Attribute Based Truck Warranty Modeling

Peterbilt – UNT Business Analytics Hackathon

Developed a multiclass XGBoost model to predict heavy-duty truck warranty cost segments from configuration and claims data, improving macro ROC–AUC to 0.80 and surfacing high‑risk option bundles for design and pricing decisions.

Python Jupyter Notebook SQL Machine Learning
View on GitHub →
Destination Recommendation Engine – Customer Travel Behavior Modeling

American Airlines Hackathon

Developed ML-based features and prototypes to analyze customer travel behavior and surface actionable insights for destination recommendation and operational planning.

Python Snowflake SQL Machine Learning
View on GitHub →
Teaching Assistant Eligibility Predictor

Teaching Assistant Eligibility Predictor using Deep Learning Models

Built a model in R to streamline the teaching assistant selection process, evaluating over 1,200 applications and assigning probability scores to identify top candidates.

R RShiny RStudio Neural Networks
View on GitHub →
SpaceX Falcon 9 Landing Success Prediction for Reusable Rocket Cost Optimization

SpaceX Falcon 9 Landing Success Prediction

Built an end-to-end classification pipeline on SpaceX launch data, including API/web scraping, SQL and Python-based EDA, feature engineering, and model evaluation to predict Falcon 9 landing success and support reusable-rocket cost optimization.

Python Web Scraping Predictive Analytics Data Visualization
View on GitHub →
AHI App Development - Project Management Capstone

AHI App Development - Project Management Capstone

Led an end-to-end project management capstone to design and plan the AHI Marketing Data App, replacing fragmented, manual market-tracking processes with a single real-time decision support application.

Project Management Stakeholder Management Work Breakdown Structure Agile Planning
View on GitHub →
Philippines MCCT Program Dashboard

Philippines MCCT Program Dashboard

Designed an interactive Tableau dashboard on Pantawid Indigenous Peoples MCCT data to profile beneficiaries by region, age, and gender, highlight underserved provinces, and surface insights for program coverage, education, and healthcare planning.

Tableau Tableau Prep
View on GitHub →
Research Data Scientist - Volunteer
Education

M.S. Mathematics & Statistics (Data Science & Advanced Analytics), Jan 2023 - Dec 2024

University of North Texas, United States

Certifications
IBM Data Science Professional Certificate - 11
IBM Data Science Professional Certificate
IBM What is Data Science
IBM Tools for Data Science
IBM Data Science Methodology
IBM Python for Data Science, AI & Development
IBM Python Project for Data Science
IBM Databases and SQL for Data Science with Python
IBM Data Analysis with Python
IBM Data Visualization with Python
IBM Machine Learning with Python
IBM Applied Data Science Capstone
Artificial Intelligence Certificates - 10
AI Fundamentals Certificate
AI Python for Beginners
Generative AI
Generative AI - Prompt Engineering Basics
Generative AI - Impact, Considerations and Ethical Issues
Leveraging AI for Enhanced Content Creation
OpenAI GPTs - Creating Your Own Custom AI Assistants
Innovative Teaching with ChatGPT
Artificial Intelligence Data Fairness and Bias
Artificial Intelligence (AI) Education for Teachers
Data Camp Certificates - 5
Exploratory Data Analysis in R
Introduction to Regression in R
Intermediate Regression in R
Machine Learning with caret in R
Building Web Applications with Shiny in R
Other Certificates - 4
Project Management Capstone
Cloud Computing Foundations
Databricks Fundamentals
Hacker Rank - SQL
Honors & Recognitions
AI In Action 2025 - Exploring User Bias and Hallucinations in Generative AI Systems

AI In Action 2025 – Exploring User Bias and Hallucinations in Generative AI Systems

Poster Presentation – 2025
2024 Humana-Mays Healthcare Analytics Case Competition

2024 Humana-Mays Healthcare Analytics Case Competition

National Healthcare Analytics Case Competition – 2024
American Airlines Machine Learning Competition

American Airlines Machine Learning Competition

Machine Learning Hackathon – 2024
UNT – Tuition Benefit Program Jan 2024 – May 2024

UNT – Tuition Benefit Program (Spring 2024)

Jan 2024 – May 2024
UNT – Tuition Benefit Program Aug 2024 – Dec 2024

UNT – Tuition Benefit Program (Fall 2024)

Aug 2024 – Dec 2024
Citi

Silver Award – Citi Bank

California Consumer Privacy Act 2020 (CCPA) – CMR project
Citi

Bronze Award – Citi Bank

CMR to MDM migration project
ICICI

Work Excellence Award – ICICI Bank

Recognition for outstanding delivery and performance
Event & Pictures
AI In Action 2025
AI In Action 2025
Humana Presentation
American Airlines Skyview #7, Fort Worth
American Airlines Skyview #7, Fort Worth
Organizations & Workshops

UNT AI in Action – Research Presenter & Graduate Participant

Sep 2024 – Apr 2025 United States
  • Completed the Texas Higher Education Coordinating Board’s AI Professional Development Program, gaining hands-on experience in prompt engineering, custom GPTs, AI-enhanced content creation, ethical AI, and trustworthy generative AI frameworks.
  • Co-authored and presented two research posters at the UNT AI in Action Workshop as: Karra, R.S., Kota, M., Mahadasu, M.P., Rajidi, S., “Ethical Considerations in AI Adoption for Education and Research” (GitHub) and “Exploring User Bias and Response Variability in Generative AI Systems” (GitHub), under the mentorship of Dr. Zeynep Orhan.
  • Collaborated with a cross-functional team of four graduate students to investigate bias, hallucination patterns, and fairness concerns in LLM-based systems using systematic prompt–response experiments and statistical analysis.
  • Contributed to responsible AI adoption frameworks by synthesizing empirical findings on ethical risks, user trust, and governance into posters and technical documentation for an interdisciplinary Human–AI Collaboration workshop.

UNT Business Analytics Club – Member

Aug 2024 – Dec 2024 United States
  • Participated in speaker sessions and case-based workshops on business analytics, data visualization, and predictive modeling, gaining exposure to real-world use cases and tools.
  • Collaborated with peers in analytics challenges and networking events, strengthening problem-solving, presentation skills, and industry connections.

UNT Data Science Talk Series – Graduate Student Attendee

Jan 2024 – Dec 2024 United States
  • Attended weekly talks by academic and industry experts on machine learning, AI, data visualization, big data analytics, and ethical data practices, broadening exposure to emerging trends and best practices.
  • Engaged with speakers on real-world applications of advanced analytics and AI-driven decision-making in sectors such as retail, supply chain, healthcare, and technology.
  • Participated in collaborative learning sessions on deep learning frameworks, cloud-based ML tools, NLP, and computational data science techniques to strengthen technical depth.
  • Networked with data science professionals, researchers, and graduate peers during Thursday sessions, exploring career pathways in analytics, ML engineering, and AI research.

UNT Society for Student AI Innovation – Member

Jan 2024 – Dec 2024 United States
  • Participated in workshops on Hugging Face models, including practical tokenization sessions emphasizing LLMs for real-world AI applications.
  • Promotes collaborative projects, research, and learning in AI/ML open to all majors, building skills in model deployment and innovation.
  • Engages members in events like online workshops that explore AI usability, such as fine-tuning LLMs for tasks like natural language processing and ethical AI use.
  • Awarded a Participation Certificate for the online workshop “How to Use Hugging Face AI Models?” in Feb 2026, recognizing active engagement in LLM and AI usability sessions (Certificate Link)
Technical Skills

Programming & Scripting

Python (Pandas, NumPy, Scikit-learn, Statsmodels, Matplotlib, Seaborn), R (dplyr, ggplot2), SAS (SAS EG, SAS DI, SAS Viya), Excel (Pivot Tables, Power Query, Functions), SQL (CTE, window functions, joins, performance tuning)

Machine Learning & Predictive Modeling

Supervised Learning, Unsupervised Learning, Ensemble Models (XGBoost, LightGBM, CatBoost), Random Forests, Gradient Boosting, Time Series Forecasting (SARIMA, ARIMA), Clustering (K-Means, Hierarchical), Recommendation Systems, Anomaly Detection, Root Cause Analysis, Feature Engineering, Model Evaluation (ROC-AUC, Precision, Recall, F1-score, Accuracy), Bias & Fairness Testing

Statistical & Experimental Methods

Hypothesis Testing, A/B Testing, Experimental Design, Regression Modeling, Churn Analysis, Demand Forecasting, Segmentation, Confidence Intervals, p-value Interpretation, t-tests, Chi-square Tests, Correlation Analysis, Causality Analysis, Significance Testing

Data Engineering & MLOps

Apache Spark, PySpark, Airflow, Delta Lake, dbt (models, tests, docs), Data Quality Validation, ETL/ELT Pipeline Development

Cloud Platforms

Microsoft Azure (Data & AI stack), AWS (SageMaker, Redshift, EC2, S3), GCP (BigQuery, Vertex AI – exposure), Cloud-based ML & Analytics Workloads

Databases & Data Warehousing

PostgreSQL, Snowflake, Amazon Redshift, SQL Server, BigQuery, Delta Lake, Databricks, Data Modeling for Analytics & Reporting

Data Visualization & BI Tools

Power BI, Tableau, Tableau Prep, Looker Studio (Google Data Studio), Streamlit, Plotly, SAS Viya, Executive Dashboards, Interactive Dashboards, Drill-down Reporting

Fraud & Business Analytics Domains

Credit Portfolio Analytics, Fraud Detection, Compliance Reporting, Regulatory Reporting (CCPA 2020), Customer Retention, Customer Lifetime Value, Operational Performance Monitoring, Dynamic Pricing, Logistics Cost Optimization

Project Management

Agile & Scrum Methodologies, Work Breakdown Structure (WBS), Risk Identification & Mitigation Planning, Scope & Timeline Management, 30–70% Rules, Stakeholder Alignment & Status Reporting

Version Control & Collaboration

Git, GitHub, Bitbucket, Confluence, JIRA, Agile, Scrum, Stakeholder Communication, Data-Driven Storytelling

Soft Skills

Excellent Written and Verbal Communication Skills, Strong Problem-Solving Skills, Ability To Collaborate, Attention To Detail, Team Player, Self-Motivated

Recommendations

Rahul K

Rahul K

Senior Associate – Data Governance

View on LinkedIn

“I had the pleasure of working with Rakesh and consistently found him to be dependable, detail-oriented, and proactive in his approach. He communicates clearly, collaborates well, and takes ownership of his responsibilities. Rakesh would be a strong asset to any team he joins.”

Recommendation, January 30, 2026

John Schroeder

John Schroeder

Data Scientist Principal – Lockheed Martin

View on LinkedIn

“I had the privilege of collaborating with Rakesh Sharma during his position as my teaching assistant for two graduate-level courses in the Department of Advanced Data Analytics at the University of North Texas. Rakesh was selected from a highly competitive pool of candidates, which underscores both his extensive knowledge in data analytics and his exceptional interpersonal skills.

His contributions were invaluable; he played a crucial role in researching course content, organizing materials, developing assessments, and providing insightful feedback. Rakesh approached his responsibilities with enthusiasm and consistently surpassed my expectations. His positive attitude and effective communication skills—both verbal and written—greatly enhanced the learning environment.

Rakesh has shown kindness and encouragement to students who seek his assistance, and he is always willing to share his experiences with fellow teaching assistants. Given his outstanding performance as my TA, his proven track record in previous roles, and his dedication to continuous learning, I am confident that he will be a significant asset to your organization.”

Recommendation, November 8, 2024

Let's Connect.

I am open to opportunities in Data Science, Machine Learning, and Analytics Engineering.