【Python|Kaggle】機器學習系列之Pandas基礎練習題(三)

海轟Pro 2021-08-15 23:00:02 阅读数:58

本文一共[544]字,预计阅读时长:1分钟~
python kaggle 系列 pandas

前言

Hello!小夥伴!
非常感謝您閱讀海轟的文章,倘若文中有錯誤的地方,歡迎您指出~
 
自我介紹 ଘ(੭ˊᵕˋ)੭
昵稱:海轟
標簽:程序猿|C++選手|學生
簡介:因C語言結識編程,隨後轉入計算機專業,有幸拿過一些國獎、省獎…已保研。目前正在學習C++/Linux/Python
學習經驗:紮實基礎 + 多做筆記 + 多敲代碼 + 多思考 + 學好英語!
 
初學Python 小白階段
文章僅作為自己的學習筆記 用於知識體系建立以及複習
題不在多 學一題 懂一題
知其然 知其所以然!

Introduction

Now you are ready to get a deeper understanding of your data.

Run the following cell to load your data and some utility functions (including code to check your answers).

運行下面代碼
導入所需數據及相應的包

import pandas as pd
pd.set_option("display.max_rows", 5)
reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
from learntools.core import binder; binder.bind(globals())
from learntools.pandas.summary_functions_and_maps import *
print("Setup complete.")
reviews.head()

此次練習的數據:
image.png

Exercises

1.

題目

What is the median of the points column in the reviews DataFrame?

解答

題目意思:

求 points列 的中比特數

median_points = reviews.points.median()

運行結果:
image.png

2.

題目

What countries are represented in the dataset? (Your answer should not include any duplicates.)

解答

題目意思:

題意:數據集中代錶了哪些國家?(你的答案不應該包含任何重複的部分。)
也就是需要我們找出數據集中country中出現的所有國家,返回值中無重複

countries = reviews.country.unique()

運行結果:
image.png

3.

題目

How often does each country appear in the dataset? Create a Series reviews_per_country mapping countries to the count of reviews of wines from that country.

解答

題目意思:

統計出每個國家所出現的次數

reviews_per_country = reviews.country.value_counts()

運行結果:
image.png

4.

題目

Create variable centered_price containing a version of the price column with the mean price subtracted.

(Note: this ‘centering’ transformation is a common preprocessing step before applying various machine learning algorithms.)

解答

題目意思:

求price列中每一個價格與price價格平均值的差

centered_price = reviews.price-reviews.price.mean()

運行結果:
image.png

5.

題目

I’m an economical wine buyer. Which wine is the “best bargain”? Create a variable bargain_wine with the title of the wine with the highest points-to-price ratio in the dataset.

解答

題目意思:

找出性價比最高的一款酒的title
性價比:分數/價格

bargain_idx = (reviews.points / reviews.price).idxmax()
bargain_wine = reviews.loc[bargain_idx, 'title']

運行結果:
image.png

6.

題目

There are only so many words you can use when describing a bottle of wine. Is a wine more likely to be “tropical” or “fruity”? Create a Series descriptor_counts counting how many times each of these two words appears in the description column in the dataset.

解答

題目意思:

分別統計 tropical、fruity在 description列中出現的次數
以Series結構返回

n_trop = reviews.description.map(lambda desc: "tropical" in desc).sum()
n_fruity = reviews.description.map(lambda desc: "fruity" in desc).sum()
descriptor_counts = pd.Series([n_trop, n_fruity], index=['tropical', 'fruity'])

運行結果:
image.png

7.

題目

We’d like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we’d like to translate them into simple star ratings. A score of 95 or higher counts as 3 stars, a score of at least 85 but less than 95 is 2 stars. Any other score is 1 star.

Also, the Canadian Vintners Association bought a lot of ads on the site, so any wines from Canada should automatically get 3 stars, regardless of points.

Create a series star_ratings with the number of stars corresponding to each review in the dataset.

解答

題目意思:

points分數 >= 95 3為三顆星
points分數 大於等於85且小於95 為兩顆星
小於85 為1顆星
特殊情况:country為Canada的全為三顆星

def stars(row):
if row.country == 'Canada':
return 3
elif row.points >= 95:
return 3
elif row.points >= 85:
return 2
else:
return 1
star_ratings = reviews.apply(stars, axis='columns')

運行結果:
image.png

結語

文章僅作為學習筆記,記錄從0到1的一個過程

希望對您有所幫助,如有錯誤歡迎小夥伴指正~

我是 海轟ଘ(੭ˊᵕˋ)੭

如果您覺得寫得可以的話,請點個贊吧

謝謝支持 ️

在這裏插入圖片描述

版权声明:本文为[海轟Pro]所创,转载请带上原文链接,感谢。 https://gsmany.com/2021/08/20210815230001640c.html