프로젝트를 포트폴리오화 시키기 위해서 | Github README 파일 작성 팁, 포트폴리오 예제

데이터사이언스/Google Advanced Data Analytics

프로젝트를 포트폴리오화 시키기 위해서 | Github README 파일 작성 팁, 포트폴리오 예제

누군가의 이야기 2023. 6. 19. 19:17

728x90

- 본 포스팅은 'Google Advanced Data Analytics Professional Certificate' 과정을 수강하며 요약/정리하기 위한 포스팅입니다.

이력서에 프로젝트를 소개하는 것을 넘어서, 포트폴리오 형식으로 보여주는 것이 중요하다.

지원하는 역할의 직무 설명에 따라 자주 변경되는 이력서와 달리, 온라인 포트폴리오는 프로젝트를 완료할 때 정리하면 된다.

이 포스팅은 이력서에 기술적 역량과 프로젝트를 표시하는 방법과 온라인 프로젝트 포트폴리오를 만드는 방법을 제시한다.

Add your technical skills and portfolio to your resume

데이터 분석 직무를 지원할 때, 잠재적 고용주에 대한 첫 번째 소개는 이력서를 통해 이루어진다.

이력서를 작성할 때는 자격증과 포트폴리오 프로젝트에서 습득한 기술력을 이력서의 "기술" 섹션에 추가하는 것이다.

특정 직무 설명에 나열된 기술적 역량을 포함한 이력서를 작성해야 한다.

다음은 이력서에 이러한 기술적 역량을 추가하는 예이다:

Example 1: Adding technical skills to your resume

스킬셋

Programming Languages: Python, Tableau, Excel
Python Packages: Scikit-Learn, NLTK, Pandas, Scipy, Seaborn
Machine Learning Models: Logistic Regression, Time Series, Natural Language Processing

이후 "기술 프로젝트", "데이터 프로젝트" 또는 "머신러닝 프로젝트" 섹션을 추가할 수 있다.

섹션의 제목은 사용자에게 달려 있으며, 작업 설명의 문구에 따라 다르다.

대시보드, 데이터 분석, 모델링 등을 포함하는 여러 데이터 프로젝트가 있는 경우 섹션의 제목을 "데이터 분석 프로젝트"로 지정하고 글머리 기호를 사용하여 다양한 유형의 프로젝트를 나열할 수 있다.

다음은 이력서에 프로젝트를 나열하는 방법의 예이다:

Example 2: Add a technical project to your resume

데이터 분석 프로젝트

소아 폐렴 이미지 분류:
- Utilized the Python Image Enhancer package to produce more images of different sizes and color palettes to train a model
- Python Image Enhancer 패키지를 사용하여 모델 교육을 위해 다양한 크기의 이미지를 더 많이 생성
- X선 영상을 폐렴으로 분류하기 위한 로지스틱 회귀 분석(사이킷런) 및 신경망 모델 (Keras/Tensorflow Package) 구축
- 리콜을 사용한 모델 성능 평가 및, 인터랙티브 Flask 애플리케이션을 사용한 배치 모델

이력서에 추가하는 기술력과 프로젝트는 온라인 프로젝트 포트폴리오에도 반영되어야 한다.

더 많은 프로그래밍 언어, 기술 패키지 및 모델을 배울수록 이력서에 추가해야 한다.

이력서에 추가한 것과 동일한 기술력을 온라인 포트폴리오에 포함된 프로젝트 내에서 보여주어야 한다.

다음 본문에서는 온라인 포트폴리오를 구축할 위치와 포트폴리오에 포함할 내용을 강조한다.

Where to create your online portfolio

데이터 전문가들이 온라인 포트폴리오를 업로드할 수 있는 플랫폼은 많다.

첫 번째 단계는 소개하고자 하는 프로젝트 유형에 맞는 플랫폼을 선택하는 것이다.

Google은 블로그 스타일의 포트폴리오에 적합하다.

GitHub와 Kaggle은 코드 기반 포트폴리오를 호스팅하는 데 더 적합하다.

Tableau는 시각화를 공유하는 데 매우 유용하다.

Github는 온라인 프로젝트 포트폴리오를 만드는 데 가장 인기 있는 플랫폼 중 하나이므로, 그 이용방법을 설명하려 한다.

What to add to your GitHub Portfolio

After setting up a github account, you will need to create separate repositories for each individual project. Each repository will contain all of your project files and a README.md file. A README is a markdown-based text file that provides an overview of your project. The following sections are great to include in your README:

github 계정을 만든 후, 개별 프로젝트 별로 별도의 리포지토리를 만들어야 한다.

각 리포지토리에는 모든 프로젝트 파일과 README.md 파일이 들어 있다.

README는 프로젝트 개요를 제공하는 마크다운 기반 텍스트 파일이다.

README에 포함하는 것이 좋은 내용:

Project Title
- 잠재 고용주에게 보여줄 분석 및 프로젝트 유형을 설명하는 적절한 제목 포함. 프로젝트 제목을 "포트폴리오 프로젝트"로 지정하지 말고, 사용된 모델링 알고리즘과 데이터를 제목에 추가하면 좋다. 예시: "Natural Language Processing of Election Day Tweets."
Project Overview
- 프로젝트 개요는 해결한 문제, 프로젝트에 사용된 데이터 및 모델링 결과를 설명하는 간결한 문장으로 구성되어야 한다.
Business Understanding
- You should have a section that showcases the stakeholder(s) and the business problem you tried to solve. Feel free to add citations of research you did on your business problem here as well. 해결하고자 한 비즈니스 문제를 제시한다.
Data Understanding
- 분석에 사용한 데이터, 데이터 기간 및 데이터 제한 사항을 설명힌다. 탐색적 데이터 분석(EDA) 시각화 한 내용도 함께 제시한다..
Modeling and Evaluation
- 사용한 모델과 해당 평가 지표에 대해 자세히 설명해야 한다.
Conclusion
- 비즈니스 문제를 해결할 때 권장하는 사항을 설명하고 프로젝트를 확장하기 위해 취할 향후 조치를 강조한다.

For more information on how to craft README files, checkout GitHub’s “About READMEs” article.

README 파일을 만드는 방법에 대한 자세한 내용은 GitHub의 "README에 대하여" 기사를 참조하면 좋다.

아래는 'Google Advanced Data Analytics Professional Certificate' 을 하며 수행한 포트폴리오 프로젝트를 예로 들었다.

The Github repository README (shown below) uses the New York City Taxi & Limousine Commission data that you have seen in your portfolio project throughout the program. This example expands beyond what was given in your original project description by including domain knowledge under the “Business Understanding” section. Domain knowledge demonstrates to a prospective employer your ability to do research before conducting a technical analysis. It is important to create a clear, concise README that summarizes your business understanding and technical findings.

Example Project: README

Predicting Taxi Gratuities in New York City

Overview

The goal of this project was to create a multiple linear regression and random forest model to predict high rider gratuity or not. This project utilized yellow taxi trips taken in New York City during 2017. The final random forest model performed with 86% accuracy and 72% precision determining what features were most important in separating low tippers from high tippers. Based on the model, the duration, distance, and cost of the trip were most influential in determining a generous tipper (>20%) vs a non-generous one (<20%).

Business Understanding

According to salary.com the average salary for a New York Taxi Driver is around $45,000. This salary is significantly low compared to a median rent value of $6,500 per month. It is important to understand what factors encourage riders to leave tips in order to help drivers obtain a livable wage.

Data Understanding

The NYC Taxi and Limousine Commission data came from NYC.gov. The data consisted of approximately 408k unique trips and 18 features. The features included information on trip duration and destination, vendor used, toll information, and payment type. The bar chart below shows the breakdown of how many generous tippers (>20%) versus non-generous tippers that exist in the data set.

In connection to this, a feature was engineered to represent if a ride was taken during rush hour or not. Multiple redundant columns were dropped and reformatted into the proper data type.

Modeling and Evaluation

A random forest model comprising 100 decision trees was used to determine feature importance in who would tip generously or not. The below plot shows that trip duration, distance, and the cost of a fare were the Top 3 most important factors in determining a generous tipper from a non-generous one. The overall model performed with 86% accuracy and 72% precision.

Conclusion

This model can benefit Taxi Drivers in knowing if they will be tipped generously or not; however, running a parametric model to determine how much each variable will influence the actual price of the tip. In the future, adding more information on a rider’s past tipping behavior may also be beneficial in helping the stakeholder address their business problem.

Outside of the README file, it is important to have the data you used, cleaned up Python notebook files, a presentation, and any images you may have used on your GitHub repository.

Checkout this additional resource from DataQuest that walks you through how to add files to your online Github portfolio. The goal is to have all project information in one repository that will help an employer understand your project, run your code, and clearly know your business recommendations.

Key Takeaways

더보기
You should review the job description’s technical skills and add the applicable skills to your resume to increase your chances for being called for an interview.
더보기
Having your data projects on your resume is a great way to showcase your hands-on technical experience for various data roles.
더보기
Github is a great online platform for building an online portfolio of coding projects that can be seen by any prospective employer. Keep in mind that there are other platforms: Kaggle, Google Sites, Tableau Public, Medium, and more R can showcase your technical writing, data visualization, and coding skills.

728x90