I am Erchi Zhang, and you can call me Archer. I am a recent graduate with an M.S. in Data Science from New York University and a B.S. in Computer Science from Brandeis University. I am actively seeking 2025 new grad opportunities in data science, software development engineering, machine learning, data analysis, or any relevant field. I am proficient in Python, Java, JavaScript, R, SQL, HTML/CSS and Shell script. During my undergraduate studies, I have coauthored two data science research papers, one is about using Graph adaption BERT to detect malicious behaviors on Twitter and another one is about assisting any GNNs to distinguishable representations without unbiased attributes. In my GitHub repositories I mainly post my projects completed during my graduate studies:
- Paper Gist: A Cost-Efficient Cloud-Native Research Paper Summarization Platform, in which I have Built a serverless cloud-native platform on AWS using Lambda, API Gateway, EC2 (for inference), SQS, EventBridge, S3, and DynamoDB with content-based deduplication to enable scalable, cost-efficient research paper summarization.
- Convert Deck to CPT Codes: AI-Driven Reimbursement Code Discovery for Health Tech Startups, in which I have built a website for processing PDF pitch decks and returning relevant Current Procedural Terminology (CPT) codes using AI. We applied Named Entity Recognition (NER) to extract key information from PDFs and utilized Retrieval-Augmented Generation (RAG) for accurate CPT code recommendations.
- Fixplainer: Failure Explainer for Multiple Object Tracking (MOT), in which I have developed a GUI tool with teammates to use SHAP explainers to explain/comprehend failures in Multiple Object Tracking (MOT) tasks. Our paper can be found here.
- JEPA Model for Agent Trajectory Prediction, in which I have implemented and trained a recurrent JEPA model to predict the trajectories of the moving agents. This is the final project assignment for NYU's DS-GA 1008 Deep Learning course, and our model's performance ended up in the 1st quartile in the class, giving us a full score in this project.
- Billionaire Data Analysis, in which I have collaborated with my teammates to conduct a comprehensive data analysis via Python on a Billionaires Statistics Dataset from Kaggle.
- Spotify Songs Data Analysis, in which I have collaborated with my teammates to perform Data Analysis on a Spotify songs dataset; we have applied various techniques including multiple linear regression, lasso/ridge regression, significance tests, PCA, K-means clustering, logistic regression, SVM, Random Forest, MLP neural network, recommendation system, and so forth, to solve each of the given questions.
- Data Visualizations, in which I have applied D3.js, NotebookJS, and some Python tools to plot static as well as dynamic graphs for visualizations in Machine Learning.
