notfounduser's diary

글

7월, 2019의 게시물 표시

python error 'cp949' codec can't encode character

- 7월 29, 2019

python 실행 에러 'cp949' codec can't encode character '\u2b50' in position 169: illegal multibyte sequence 1. 원인 파일 읽기 옵션\의 문제 f = open("i:/Share/pusan.txt", 'w') data = data + link[0].text + ' ' if len(data) > 1000: f.write(data) 2.해결 파일의 읽기 쓰기 오류로서 파일 쓰기 옵션에 utf-8을 붙여 줍니다. f = open("i:/Share/pusan.txt", 'w', -1, "utf-8")

자세한 내용 보기

python machine learning using Doc2Vec (3/3)

- 7월 18, 2019

python을 이용해서 문장 학습을 시키고 해당 문장이 의문문인지 평문인지에 대한 테스트를 진행해 봅니다. 2편에 이어서 학습된 모델을 가지고 실제 분류확인을 해보겠습니다. 1. 환경준비 Windows 10 python 3.7 konlpy gensim 2. edit qna_test.py from collections import namedtuple from gensim.models import doc2vec from konlpy.tag import Twitter import multiprocessing from pprint import pprint from gensim.models import Doc2Vec from sklearn.linear_model import LogisticRegression import numpy import pickle twitter = Twitter() def read_data(filename): with open(filename, 'r', encoding='UTF8') as f: data = [line.split('\t') for line in f.read().splitlines()] return data def tokenize(doc): # norm, stem은 optional return ['/'.join(t) for t in twitter.pos(doc, norm=True, stem=True)] # 실제 구동 데이터를 읽기 run_data = read_data('C:/work/python/knlp/data/qna_run.txt') # 형태소 분류 run_docs = [(tokeniz...

자세한 내용 보기

python machine learning using Doc2Vec (2/3)

- 7월 17, 2019

python을 이용해서 문장 학습을 시키고 해당 문장이 의문문인지 평문인지에 대한 테스트를 진행해 봅니다. 1편에 이어서 학습된 데이터를 바탕으로 학습 모델을 만듭니다. 1. 환경준비 Windows 10 python 3.7 konlpy gensim 2. edit qna_test.py from collections import namedtuple from gensim.models import doc2vec from konlpy.tag import Twitter import multiprocessing from pprint import pprint from gensim.models import Doc2Vec from sklearn.linear_model import LogisticRegression import numpy import pickle twitter = Twitter() def read_data(filename): with open(filename, 'r', encoding='UTF8') as f: data = [line.split('\t') for line in f.read().splitlines()] return data def tokenize(doc): # norm, stem은 optional return ['/'.join(t) for t in twitter.pos(doc, norm=True, stem=True)] # 테스트 데이터를 읽기 train_data = read_data('C:/work/python/knlp/data/qna_train.txt') test_data = read_data('C:/wo...

자세한 내용 보기

python machine learning using Doc2Vec (1/3)

- 7월 17, 2019

python을 이용해서 문장 학습을 시키고 해당 문장이 의문문인지 평문인지에 대한 테스트를 진행해 봅니다. 1. 환경준비 Windows 10 python 3.7 konlpy gensim 2. edit qna_train.py from collections import namedtuple from gensim.models import doc2vec from konlpy.tag import Twitter import multiprocessing from pprint import pprint twitter = Twitter() def read_data(filename): with open(filename, 'r', encoding='UTF8') as f: data = [line.split('\t') for line in f.read().splitlines()] return data def tokenize(doc): # norm, stem은 optional return ['/'.join(t) for t in twitter.pos(doc, norm=True, stem=True)] #doc2vec parameters cores = multiprocessing.cpu_count() vector_size = 300 window_size = 15 word_min_count = 2 sampling_threshold = 1e-5 negative_size = 5 train_epoch = 100 dm = 1 worker_count = cores # 트래이닝 데이터 읽기 train_data = read_data('C:/work/py...

자세한 내용 보기

install docker(with oracle 11g)

- 7월 10, 2019

docker위에 오라클 설치하고 간단하게 테스트 1. 환경준비 Centos 7.x 2. install docker [root@localhost ~]# yum install docker Loaded plugins: fastestmirror, langpacks Loading mirror speeds from cached hostfile * base: data.aonenetworks.kr * epel: fedora.cs.nctu.edu.tw * extras: data.aonenetworks.kr * updates: data.aonenetworks.kr Resolving Dependencies --> Running transaction check ---> Package docker.x86_64 2:1.13.1-96.gitb2f74b2.el7.centos will be installed --> Finished Dependency Resolution Dependencies Resolved ================================================================================================================================== Package Arch Version ...

자세한 내용 보기

install jupyter

- 7월 09, 2019

python jupyter setup 1. 환경준비 Centos 7.x Anaconda Python 3.7 version 2. install [root@localhost ~]# . ./venv/bin/activate (venv) [root@localhost ~]# pip install jupyter Collecting jupyter Using cached https://files.pythonhosted.org/packages/83/df/0f5dd132200728a86190397e1ea87cd76244e42d39ec5e88efd25b2abd7e/jupyter-1.0.0-py2.py3-none-any.whl Collecting ipykernel (from jupyter) Downloading https://files.pythonhosted.org/packages/a0/35/dd97fbb48d4e6b5ae97307497e31e46691adc2feedb6279d29fc1c8ad9c1/ipykernel-5.1.1-py3-none-any.whl (114kB) |████████████████████████████████| 122kB 253kB/s Collecting notebook (from jupyter) Using cached https://files.pythonhosted.org/packages/f6/36/89ebfffc9dd8c8dbd81c1ffb53e3d4233ee666414c143959477cb07cc5f5/notebook-5.7.8-py2.py3-none-any.whl Collecting jupyter-console (from jupyter) Downloading https://files.pythonhosted.org/packages/cb/ee/6374ae8c21b7...

자세한 내용 보기