File size: 4,665 Bytes
a619426
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# personal information
## identification
Singapore Permanent Resident|Chinese citizen

## address 
17 Jalan Masjid, Singapore

## contact 
yingxu.he1998@gmail.com|+65 91752741|+86 15063250971

# Working Experience
## Machine Learning Engineer at Huawei Ltd.
• from Dec 2022 to present

• Built a pipeline to automatically visualize data tables using LSTM network trained on ChatGPT-generated
data with pairwise loss method, achieving 80% recall@5 on 100+ internal test cases.

• Designed and implemented a novel SISR method that enhanced WIFI-signal simulations for office buildings
by achieving 10x speedup compared to physics-based simulation with negligible loss in accuracy (1% MAE)
on over 80 large-scale office layouts.

## Machine Learning Research Engineer at Dyson Ltd.
• from Sept 2021 to Dec 2022

• Implemented an object localization model in a few -shot context by semi -supervised training. The model
achieved comparable results to professional  software with improved adaptability and robustness .  

• Designed and implemented  an air quality estimation model, using  LGBM, Bayesian Regression, etc., with
geographical and meteorological features . Demonstrat ed its advantages over spatial interpolated methods  
and deployed  the pipeline with Metaflow framework on AWS services.

## ML Research Assistant at NUS -Singtel Cyber  Security Lab
• from Sept 2020 to July 2021

• Identif ied anomalies from system logs leveraging  DBSCAN  and hierarchical clustering  for model training .

• Developed an information retrieval  method  for web -attack  strategy identification  from system and firewall
logs. The recall@3 rate achieved 80% on 100+ hand -labelled samples . 

## Data Analyst Intern at GIC Pte. Ltd.  
• from Dec 2018 to July 2019

• Deployed an R application that forecasts the mid -term returns of portfolio  with  visualization using R shiny .

• Optimized the coefficients of a mean reversion forecasting model using the Genetic Algorithm. 

## Data Analyst Intern at PropertyGuru
• from May 2018 to Aug 2018

• Developed dashboard s in Tableau to analyze the user behaviors and listings’ performance to better match
user demand to agents’ recommendations.  

• Implemented a POC to calculate and geographically visualize the liveability  score for properties .  

# Education
## Master of Computing in Artificial Intelligence at National University of Singapore                                                                        
• from Aug 2020 to Sept 2021
• School of Computing : CAP 4.42/5.0   
• Teaching Assistant : Advanced Analytics and Machine Learning (from Jan 2021 to May 2021)

## Bachelor of Science (Hons) in Business Analytics at National University of Singapore                                                                   
• from Aug 2016 to June 2020  
• School of Computing : CAP 4.15/5.0 , Dean’s List in Semester 3 AY 2018/2019  
• Distinction : Analytics Techniques Knowledge Area (awarded in Dec 2020)
• Teaching Assistant : Programming Methodology in python (from Aug 2017 to June 2018)

# Relevant  Projects  
## Distilling ChatGPT for finetuning image captioning models                        
• from Jan 2023 to Present  
• Employed Chain -of-Thought with verification prompting technique on ChatGPT to create 10k+ accurate
capt ions from the xView annotations. Fine -tuned a GIT image captioning model  and significantly improved
the CIDE r score from 11.59 to 85.93 over 2k RSICD samples.  
## Dialogue  Response  Generation ( Master Thesis ) at NUS NExT++ Lab          
• from Nov 2020 to Aug 2021  
• Built an enriched task -oriented response generation by implementing copy -mechanism on GPT -2 using
Pytorch. The proposed model is capable of naturally incorporating external tips/user reviews about venues
into responses. The generated response outperforms m any state -of-the-art models on user satisfaction.  
## Property Resale Price Prediction                                                                                  
• from Jan 2021 to May 2021  
• Fitted CatBoost, LGBM, XGBoost on 43k pieces of property sales data. Selected  features by correlation  and
information gain. Engineered new features describing properties’ livability. Reduce d data dimensionality  
with WOE encoding. The f inal ensemble methods’ accuracy achieved 5th/64 place.

# Skills  
• Python (Pytorch, Tensorflow), R : Machine
Learning, Deep Learning , Data processing  
• SQL, Spark:  Data query and  big data   
• Tableau, PowerBI : Visualization development  
• Java, Git, Scala, JavaScript, HTML, CSS : Software
Development