Matthew Li - Personal Website

Built AWS Glue jobs using PySpark and Spark SQL to extract data from Snowflake, transform it into required schemas, and load into Amazon S3
Implemented CSV-to-Parquet conversions for more efficient querying and storage
Automated ingestion processes, reducing manual steps and improving pipeline reliability

Applied ETL concepts — extracting raw SAP/Snowflake data, transforming it (column parsing, splitting, and formatting), and loading into target systems
Worked with data ingestion flows for ValueTrak billing and other IRIS-related data products

Engineered a ChatGPT-like application that engaged SAP cloud data and Azure OpenAI to assist 100+ interns with updated SAP data flow and mapping requirements
Designed and implemented the RAG architecture to retrieve relevant SAP documentation and system information, then generate contextual responses using Azure OpenAI
Enhanced intern efficiency by providing instant access to complex SAP data systems, reducing time spent searching through documentation and improving understanding of new data flows

Wrote and optimized complex SQL statements to extract, join, and transform data from multiple Snowflake schemas
Designed queries to map raw SAP data to ValueTrak Billing Document specifications, handling null values and field mismatches
Verified query outputs with Oracle SQL developers to ensure consistency between systems

Applied fact & dimension table concepts for analytics, differentiating between transactional facts and descriptive dimensions
Used SQL string manipulation (SPLIT_PART, DATE_FORMAT, LPAD, etc.) to create business-ready columns such as calendar months/weeks

Built a predictive model to detect fraudulent job postings, starting with comprehensive data cleansing
Applied Python scripts to handle missing values, generate derived columns, and prepare datasets for modeling
Implemented feature engineering techniques including column splitting, text standardization, and data validation

Collaborated with IT leads to compile metadata for partner systems and mapped interfaces for ASPIRE's SAP S/4HANA migration
Attended Palantir Foundry trainings to understand enterprise data integration capabilities

Genentech