Powered by Movable Type Pro

ACTIVITYActivity reports

JAMSTEC Internship Report
(HIGUCHI, Course of Marine System Engineering)

2021/12/16

For one month from September to October, I participated in an internship hosted by the Japan Agency for Marine-Earth Science and Technology (JAMSTEC).

higuchi jamstec1.jpg

During the internship, I commuted to JAMSTEC's Yokohama Institute every week.

In the internship, I worked on developing an image classification model to predict how many hours before typhoon formation clouds that might grow into a typhoon appear. Predicting the formation of typhoons is extremely important for planning disaster control measures. In the past, typhoons were predicted based on mathematical models. However, the adoption of data-driven approaches that predict typhoons based on previously accumulated simulation and observation data has made it possible to compensate for the shortcomings of mathematical-model-based prediction. I assisted the research as an intern under the instruction of a scientist who is one of the leading experts in this field.

My main duties were model tuning and data preprocessing. Since the learning data consisted of massive quantities of meteorological data and weather simulation data, I used JAMSTEC's supercomputer Earth Simulator 4 to process them at high speeds. The supercomputer was different from the computers that I normally use, and I was not familiar with the system configuration, so it took me a while to learn how to use it. As operations could not be performed through a GUI, it was difficult to perform them with commands alone and use vi.

Once I got used to operating the Earth Simulator, I was able to process the data at high speeds. This research could be paraphrased as a deep learning task that predicts how many hours before the formation of a typhoon the clouds corresponding to the meteorological data input into the model appear. (How many hours prior to typhoon formation the clouds corresponding to the data for learning appeared is differentiated using labels, so it is actually a classification problem.) In addition, the amount of data for learning was very biased by label. As a result, ensuring successful learning from imbalanced data was a challenge for me in this internship.

higuchi jamstec2.png

In addition to the Earth Simulator, I was also able to use a high-performance workstation.
(Photograph is partially blurred.)

One way to ensure successful learning from imbalanced data is a technique called data augmentation. In this technique, when some labels have smaller amounts of data, data with those labels is augmented by adding noise or rotating it in order to remove biases from the learning data. Since this research involved two-dimensional data similar to images, for labels that had small amounts of data, I implemented a program that augmented the data by combining three operations: ① adding noise, ② rotating, and ③ flipping. Although I succeeded in increasing the amount of data for learning through simple image processing, learning from this data did not produce good results. Accordingly, I focused on not only data preprocessing but also model implementation. However, the final day of the internship arrived while I was still in the middle of working on model implementation, so I was unable to see successful learning from the imbalanced data through to completion.

On the final day, I gave a presentation to my instructor on the approach I took to ensure accurate learning from imbalanced data and received feedback. Since the label with the most data had more than 10,000 items of data and the label with the least data had only about 100 items, there was a discrepancy of over 10,000 items. Balancing this data through augmentation would apparently require additional efforts.

Through this internship, I had the opportunity to assist research at JAMSTEC. The challenge that I was assigned was a task in which I was not experienced, and solving it was directly connected to progress in the research. Accordingly, although I initially planned to work on the internship three days a week, looking back, I think I actually worked four or five days a week. Despite increasing the amount of working time, I was unable to solve the challenge by the final day. Nonetheless, I learned a lot from my instructor about how to approach research activities. Since I myself had never experienced presenting at a conference or submitting a paper, this internship at a research institution allowed me to reexamine how I approach my own research. In conclusion, I would like to thank my instructor for making this such an educational and productive month. Thank you.

HIGUCHI, 2nd year student, Course of Marine System Engineering