250x250
Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
Tags
- __getitem__
- 파이썬
- __sub__
- items()
- randrange()
- decode()
- mro()
- shuffle()
- node.js
- shutil
- MySqlDB
- __annotations__
- glob
- 오버라이딩
- remove()
- CSS
- MySQL
- discard()
- count()
- fnmatch
- zipfile
- choice()
- __len__
- locals()
- View
- inplace()
- HTML
- fileinput
- Database
- JS
Archives
- Today
- Total
흰둥이는 코드를 짤 때 짖어 (왈!왈!왈!왈!왈!왈!왈!왈!왈!왈!왈!)
(Python, 과제) 크롤링 과제2 본문
728x90
반응형
과제1
- 바나프레소(https://banapresso.com/) 매장명, 주소를 크롤링해서 엑셀로 내보내기
In [42]:
!pip install pandas
Requirement already satisfied: pandas in c:\users\acer\appdata\local\programs\python\python38\lib\site-packages (2.0.2)
WARNING: You are using pip version 20.2.1; however, version 23.1.2 is available.
You should consider upgrading via the 'c:\users\acer\appdata\local\programs\python\python38\python.exe -m pip install --upgrade pip' command.
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\acer\appdata\local\programs\python\python38\lib\site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\users\acer\appdata\local\programs\python\python38\lib\site-packages (from pandas) (2023.3)
Requirement already satisfied: numpy>=1.20.3; python_version < "3.10" in c:\users\acer\appdata\local\programs\python\python38\lib\site-packages (from pandas) (1.24.2)
Requirement already satisfied: tzdata>=2022.1 in c:\users\acer\appdata\local\programs\python\python38\lib\site-packages (from pandas) (2023.3)
Requirement already satisfied: six>=1.5 in c:\users\acer\appdata\local\programs\python\python38\lib\site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
In [1]:
import pandas as pd
import chromedriver_autoinstaller
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from urllib.request import Request, urlopen
In [2]:
def store_append(dic_list):
store_list_xpath = '/html/body/div/div/div/div/article/div/section/div/div[1]/div[2]'
store_list = driver.find_element(By.XPATH, store_list_xpath)
store_names = store_list.find_elements(By.CSS_SELECTOR, '.store_name_map > i')
store_addresses = store_list.find_elements(By.CSS_SELECTOR, '.store_name_map > span')
for store_name, store_address in zip(store_names, store_addresses):
dic = {'매장명':store_name.text, '주소':store_address.text}
dic_list.append(dic)
In [3]:
def store_list():
dic_list = []
pagination_xpath = '/html/body/div/div/div/div/article/div/section/div/div[1]/div[3]/ul'
next_page_xpath = '/html/body/div/div/div/div/article/div/section/div/div[1]/div[3]/span[2]'
while(True):
pagination = driver.find_element(By.XPATH, pagination_xpath)
pages = pagination.find_elements(By.CSS_SELECTOR, 'li > a')
for i in range(len(pages)-1):
if pages[i].get_attribute('class') == 'on':
store_append(dic_list)
pages[i+1].click()
time.sleep(1)
store_append(dic_list)
try:
next_page = driver.find_element(By.XPATH, next_page_xpath).click()
except:
break
time.sleep(1)
return dic_list
In [4]:
driver = webdriver.Chrome()
driver.implicitly_wait(3)
url = 'https://banapresso.com/store'
driver.get(url)
time.sleep(3)
stores = store_list()
In [5]:
df_store = pd.DataFrame(stores)
df_store
Out[5]:
매장명 | 주소 | |
---|---|---|
0 | 가산디지털단지역점 | 서울시 금천구 가산동 60-3 |
1 | 강남구청점 | 서울 강남구 청담동 45-4 |
2 | 강남역사거리점 | 서울특별시 강남구 역삼동 820-10 |
3 | 강남역점 | 서울 강남구 역삼동822-7 |
4 | 강남점 | 서울 강남구 테헤란로4길 46 (역삼동 826-37) 쌍용 플래티넘, 1층 |
... | ... | ... |
105 | 가산파트너스타워점 | |
106 | 구로우림1차점 | |
107 | 문정테라타워점 | 서울특별시 송파구 송파대로 167, 문정역테라타워 A동 G123호 |
108 | 시흥은계점 | 경기도 시흥시 은계번영길 11,111호 |
109 | 원주무실점 |
110 rows × 2 columns
In [6]:
df_store.to_excel('store.xlsx')
과제2
- 쇼핑몰을 하나 선정하여 카테고리를 정해 크롤링하고 해당 카테고리 사진을 폴더로 정리하기
- mysql에 테이블을 만들고 카테고리와 파일경로를 저장
In [2]:
import MySQLdb
import os
import pathlib
In [3]:
class ShopDAO:
def __init__(self):
self.db = None
def connect(self):
self.db = MySQLdb.connect('localhost', 'root', '1234', 'kdt')
def disconnect(self):
self.db.close()
def table(self):
self.connect()
cur = self.db.cursor()
sql = """
CREATE TABLE shop (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(250),
category VARCHAR(50),
path VARCHAR(250)
)
"""
try:
cur.execute(sql)
except:
print('이미 존재하는 테이블')
finally:
self.db.commit()
self.disconnect()
def insert(self, name, category, path):
self.connect()
cur = self.db.cursor()
sql = "insert into shop (name, category, path) values(%s, %s, %s)"
data = (name, category, path)
result = cur.execute(sql, data)
self.db.commit()
self.disconnect()
In [4]:
shopDAO = ShopDAO()
In [6]:
# 테이블 생성
shopDAO.table()
In [8]:
driver = webdriver.Chrome()
driver.implicitly_wait(3)
url = 'https://www.29cm.co.kr/shop/category/list?category_large_code=272100100&category_medium_code=272109100&sort=latest&category_small_code=&page=1&brand=&min_price=&max_price=&is_free_shipping=&is_discount=&is_soldout=&colors='
driver.get(url)
time.sleep(3)
content_list = []
for i in range(4, 9):
category_xpath = f'/html/body/shop-root/div/section/ui-list-category/div/ui-category-option/div/ruler-nav-category/div/div/div/ul/li[2]/ul/li[{i}]/button'
category_element = driver.find_element(By.XPATH, category_xpath)
category = category_element.text
if not os.path.exists(category):
os.mkdir(category)
content_xpath = '/html/body/shop-root/div/section/ui-list-category/div/div/div[3]/ul'
content = driver.find_element(By.XPATH, content_xpath)
image_elements = content.find_elements(By.TAG_NAME, 'img')
name_elements = content.find_elements(By.CSS_SELECTOR, 'div.name')
image_urls = []
for image_element in image_elements:
image_url = image_element.get_attribute('src')
image_urls.append(image_url)
for image_url, name_element in zip(image_urls, name_elements):
name = name_element.text
filename = image_url.split('/')[-1].split('?')[0]
image_byte = Request(image_url, headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'})
path = f'./{category}/{filename}'
f= open(path, 'wb')
f.write(urlopen(image_byte).read())
f.close()
shopDAO.insert(name, category, path)
next_category_xpath = f'/html/body/shop-root/div/section/ui-list-category/div/ui-category-option/div/ruler-nav-category/div/div/div/ul/li[2]/ul/li[{i+1}]/button'
try:
driver.find_element(By.XPATH, next_category_xpath).click()
except:
break
728x90
반응형
'과제' 카테고리의 다른 글
(Python, 과제) 크롤링 과제1 (1) | 2023.06.09 |
---|---|
(CSS, 과제) 교촌 치킨 클론코딩 (0) | 2023.04.05 |
(CSS, 과제) 뉴스 페이지와 즐겨찾기 페이지 (0) | 2023.03.28 |
(HTML, 과제) 이력서 만들기 (0) | 2023.03.28 |
(파이썬, MySQL, 과제) 자판기 프로그램 (0) | 2023.03.28 |
Comments