I’m a multilingual linguist transitioning into Data Science & NLP.
With a background in education, communication, and text analysis, I bring human-centered thinking into machine learning and data work. I’m especially interested in corpus curation, NLP, ethical AI, and accessibility through language technologies.
I’m currently training in Data Science & AI at WBS Coding School (Germany) while building portfolio projects in Python, SQL, and Linguistic Data.
- 🐍 Python
- 📊 SQL
- ☕ Java (foundational)
- Data Cleaning, EDA & Feature Engineering
- Statistical Analysis & Experiment Design (basics)
- NLP Fundamentals: text preprocessing, tokenization, labeling
- Corpus Review & Linguistic Data Annotation
- Model Evaluation & Error Analysis (introductory)
- Data & ML: Pandas, NumPy, Scikit-learn
- Visualization: Matplotlib, Seaborn, Tableau, Looker Studio
- Web & Data Access: Requests, BeautifulSoup, SQLAlchemy
- Backend: Django (basics)
- NLP: Hugging Face ecosystem (introductory)
- Google Cloud Platform (basics)
- Microsoft Azure (Basics)
- Git & GitHub (collaboration, forks, PRs)
- Conda, Jupyter Notebook
- VS Code, Eclipse
- Relational Databases (SQL)
- NoSQL: MongoDB (basic)
- Structured thinking & analytical reasoning
- Clear communication of complex concepts
- Attention to detail (language, annotation, evaluation)
- User-centered and inclusive perspective
- Windows, macOS
- TCP/IP (Introduction)
- Team collaboration
- Empathy & clear communication
- Analytical & structured thinking
- Problem solving
- Initiative & self-learning
A user-friendly book lending system designed to track borrowed books, return dates, and overdue items.
Focus on practical problem-solving, data modeling, and user-oriented design.
Basic online banking tool featuring user registration, account management and transfers. Focus on Python fundamentals, data structures, and clean code practices.
Data analysis of a Brazilian e-commerce dataset, including logistics evaluation, revenue insights, operational reliability, and product value performance using SQL.
--
🤖 Generative AI Project Template (Forked)
Production-ready template for structuring Generative AI projects.
Used to study scalable project architecture, prompt workflows, and best practices for GenAI development.
📈 Yellowbrick (Forked)
Visual analysis and diagnostic tools for machine learning model selection.
Exploring model evaluation, performance visualization, and interpretability techniques.
📰 Fake News Detection with Machine Learning (Forked)
Machine learning pipeline for detecting fake news.
Used to explore text preprocessing, feature extraction, and supervised learning for NLP tasks.
- 🔤 Corpus Review Contributions
Reviewing and correcting linguistic dataset entries, focusing on metadata quality and consistency
🔗 Mozilla Common Voice
- 🇧🇷 Portuguese – Native
- 🇬🇧 English – C1
- 🇩🇪 German – B2
- 🇪🇸 Spanish – C1
🎯 My goal is to help build responsible and inclusive language technologies that empower people through data and AI.