Xinyu(Xiyah) Chang (She/Her)
Bio | Publications | Projects | News
Bio
An aspiring data scientist pursuing M.S.E. in Data Science at Johns Hopkins. With a strong academic background in Economics, Mathematics, and Data Science from the University of Washington, I’ve developed skills in machine learning, data visualization, AI, and NLP. Proficient in Java, Python, SQL, R. I’m fluent in English and a native Chinese speaker. I’m eager to collaborate on boundary-pushing data science projects and research. Beyond the academic and professional realm, I’m a dedicated volunteer and leader, having served in various capacities before.
Welcom to explore more about me for my Bachelor’s and Master’s school-life through my timelines.
My research interests:
- Multilingual,
- Multi-modal LLMs
- Safety and reliability of AI
- Data-to-text generation.
My research works see these pages for details.
As a gamer outside of school, I would like to explore applications of large language models (LLMs) or any NLP techniques in Gaming Industry. Include but not limit to:
- Game Localization
- Ethics problems in games
- Text-to-image models for generating character designs, using retrieval to ensure historically accurate elements
- Models fine-tuning to build better NPC conversations with users.
Video games I played (include bot not limit to):
- Overwatch 2, Apex Legends, Valorant, CSGO 2, Battlefield 5
- League of Legends
- Genshin Impact, Wuthering Waves
- Cyberpunk 2077, Elden Ring, Europa Universalis IV, Red Dead Redemption 2
- Terraria, The Legend of Zelda
Publications
2024
Music Emotion Prediction Using Recurrent Neural Networks
Xinyu Chang, Xiangyu Zhang, Haoruo Zhang, Yulu Ran.
2022
Elevator Optimization: Application of Spatial Process and Gibbs Random Field Approaches for Dumbwaiter Modeling and Multi-Dumbwaiter Systems
Zheng Cao, Benjamin Lu Davis, Wanchaloem Wunkaew, Xinyu Chang.
Projects
2024
May
Multilingual Evidence Against Hallucinations
- Professor: Philipp Koehn @ JHU
- Type: Summer Research
- Brief: The issue we want to solve and improve based on current techniques is about retrieve resources from different languages from the given query in a QA system. We proposed a method using Retrieval Augmented Generation systems combining Meta’s Laser encoding with Llama3 Large Langauge Models to train on MegaWika Dataset where they have QA generated based on multilingual Wikipedia and their reference articles organized in English.
- Specific Updates: Notes
March
MambaDiff: Seq2Seq Models with Diffusion Model and Mabma Architectures
- Instructor: Daniel Khashabi @ JHU
- Type: Course Project
- Course: EN.601.671 Natural Language Processing: Self-Supervised Models
- Brief: We are interested in the performance of the newly published Mamba Architectures which apply the idea of Selected State Space Models, whether the performance will be good if we combine it to the Diffusion models, which usually used for visual tasks, apply to language tasks.
- Specific Updates: Notes
Januaray
Music Emotion Recognition Using RNNs
- Instructor: Anthony, Kearsley @ JHU
- Type: Course Project
- Course: EN.553.602 Research and Design in Applied Mathematics: Data Mining
- Brief: Do you know there is a Russell’s Emotion Quadrant which you can locate various emotions into a 2-d quadrant where the x-axis presents levels of Valence, the y-axis presents levels of Arousals. Through this research project, we handled problems with our limited dataset(900 30-sec audio clips), limited resources(google colab’s coding space), and model-selections. Librosa is the package we choose to embed the audio clips.
- Specific Updates: Notes
2022
March
Model Selection for Online Course Participation
- Instructor: Pemi Nguyen @ UW
- Type: Course Project
- Course: STAT/CSE 416 Intro. Machine Learning
- Brief: The goal for us is to utilize at least three different kind of Machine Learning models to predict the status of students’ online course completions: to predict based on given students’ course participation information and some demographic information to know whether they can complete the course and get the certification. We used Neural Networks, K-Nearest neighbors, and Random Forest to predict.
- Specific Updates: Notes
2021
October
Vaccine Schedular Database
- Instructor: Ryan Maas @ UW
- Type: Course Project
- Course: CSE 414 Database Systems
- Brief: This is a cool project allowing us to implement a backend for vaccine schedular using python combined with SQL commands, connecting to Microsoft Azure server.
- Specific Updates: Notes
News
2024
May - I joined Professor Philipp Koehn’s research group to explore more about Multilingual Information Retrieval.
2023
August - I enrolled in Johns Hopkins University for my Data Science Master of Science Engineering studies.
June - I graduated from University of Washington, Seattle, for four-year Bachelor of Arts double major in Mathematics and Economics, minor in data science which follows my interests.
Education
Master's Degree - Johns Hopkins University, Baltimore | Aug. 2023 - Current(Expect May 2025) |
Data Sceince Major. Relevant coursework or projects: See this link. | |
Bachelor's Degree - University of Washington, Seattle | Aug. 2019 - Jun. 2023 |
Economics, Mathematics Double Major, Data Science Minor. Relevant coursework or projects: See this link. | |
High School - Flintridge Sacred Heart Academy, La Canada Flintridge | Aug. 2016 - Jun. 2019 |