1.hanlpå¯ä»¥ä½¿ç¨pythonè°ç¨å
hanlpå¯ä»¥ä½¿ç¨pythonè°ç¨å
å®è£ JDK
JPype并没æåIKVMé£æ ·å®ç°èªå·±çJVMï¼èæ¯ä»¥pipeæ¹å¼è°ç¨åçJVMãæ以æ们éè¦ä¸ä¸ªJVMï¼æ¯å¦ï¼
Oracle JDK
OpenJDK
å®è£ JDKé常ç®åï¼åæ¸ æ¥ä½åä½å³å¯ï¼å¿ é¡»ä¸OSåPythonçä½æ°ä¸è´ï¼å ·ä½å®è£ è¿ç¨ä¸åèµè¿°ã
å¯ä¸éè¦æ³¨æçæ¯ï¼å¿ 须设置ç¯å¢åéJAVA_HOMEå°JDKçæ ¹ç®å½ï¼JDKçå®è£ ç¨åºä¸ä¸å®ä¼å¸®ä½ åè¿ä¸æ¥ã
å®è£ ç¼è¯å·¥å ·é¾
Pythonçpackageä¸è¬æ¯ä»¥æºç å½¢å¼åå¸çï¼å ¶ä¸ä¸äºC代ç å¿ é¡»å¨ç¨æ·æºå¨ä¸ç¼è¯ï¼æ以éè¦å®è£ ç¼è¯å·¥å ·é¾ãå½ç¶ä½ ä¹å¯ä»¥è·³è¿è¿æ¥ï¼ç´æ¥ä¸è½½binaryã
Windows
å®è£ å è´¹çVisual C++ Express ã
Debian/Ubuntu
sudo apt-get install g++
Red Hat/Fedora
su -c 'yum install gcc-c++'
å®è£ JPype
æ¬æ读è åºè¯¥é½æ¯Pythonç¨åºåï¼æ以ç¥è¿äºå®è£ Pythonè¿ä¸æ¥ãä¸è¿å¿ 须注æçæ¯ï¼JPypeçæ¬ä¸Pythonç对åºå ¼å®¹å ³ç³»ï¼
Python2.xï¼JPype
Python3.x:JPype1-py3
使ç¨setup.pyå®è£
ä¸è½½æºç å解åï¼å¨ç®å½ä¸è¿è¡ï¼
*nix
sudo python3 setup.py install
Windows
python setup.py install
ç´æ¥ä¸è½½binary
å½ç¶ä½ ä¹å¯ä»¥éæ©ä¸è½½binaryï¼æ¯å¦JPype1-py3主页ä¸çbinaryå表ã
å¨Pycharmä¸å®è£
å¦æä½ æ£å¨ä½¿ç¨Pycharmè¿æ¬¾IDEçè¯ï¼é£ä¹äºæ å°±ç®åå¤äºã
é¦å å¨Project Interpreteréé¢ç¹å»å å·ï¼
æç´¢JPypeï¼éæ©ä½ éè¦ççæ¬å®è£ :
ç¨ççå»å°±å®è£ æåäºï¼
æµè¯å®è£ ç»æ
ç»äºåå°äºå代ç çå¼å¿æ¶é´äºï¼å¯ä»¥éè¿å¦ä¸ä»£ç æµè¯æ¯å¦å®è£ æåï¼
from jpype import *startJVM(getDefaultJVMPath())java.lang.System.out.println("hello world")shutdownJVM()
è¾åºå¦ä¸ç»æ表示å®è£ æåï¼
hello worldJVM activity report : classes loaded : JVM has been shutdown
è°ç¨HanLP
å ³äºHanLP
HanLPæ¯
ä¸ä¸ªè´åäºåç产ç¯å¢æ®åNLPææ¯çå¼æºJavaå·¥å ·å ï¼æ¯æä¸æåè¯ï¼N-æçè·¯åè¯ãCRFåè¯ãç´¢å¼åè¯ãç¨æ·èªå®ä¹è¯å ¸ãè¯æ§æ 注ï¼ï¼å½åå®ä½
è¯å«ï¼ä¸å½äººåãé³è¯äººåãæ¥æ¬äººåãå°åãå®ä½æºæåè¯å«ï¼ï¼å ³é®è¯æåï¼èªå¨æè¦ï¼çè¯æåï¼æ¼é³è½¬æ¢ï¼ç®ç¹è½¬æ¢ï¼ææ¬æ¨èï¼ä¾åå¥æ³åæ
ï¼MaxEntä¾åå¥æ³åæãç¥ç»ç½ç»ä¾åå¥æ³åæï¼ã
ä¸è½½HanLP
ä½ å¯ä»¥ç´æ¥ä¸è½½Portableççjarï¼é¶é ç½®ã
ä¹å¯ä»¥ä½¿ç¨èªå®ä¹çHanLPââHanLPç±3é¨åç»æï¼ç±»åºhanlp.jarå ã模ådataå ãé ç½®æ件hanlp.propertiesï¼è¯·åå¾é¡¹ç®ä¸»é¡µä¸è½½ææ°çï¼/hankcs/HanLP/releasesã对äºéportableçï¼ä¸è½½åï¼ä½ éè¦ç¼è¾é ç½®æ件第ä¸è¡çrootæådataçç¶ç®å½ï¼è¯¦è§ææ¡£ã
è¿éï¼å设æ°å»ºäºä¸ä¸ªç®å½ï¼åå®ä¸ºC:\hanlpï¼ï¼æhanlp.jaråhanlp.propertiesï¼portableççè¯ï¼ä» éä¸ä¸ªhanlp-portable.jarï¼æ¾è¿å»ï¼
Pythonè°ç¨
ä¸é¢æ¯ä¸ä»½Python3çè°ç¨ç¤ºä¾ï¼
# -*- coding:utf-8 -*-
# Filename: main.py
# Authorï¼hankcs
# Date: // :
from jpype import
*startJVM(getDefaultJVMPath(),码分arcgis engine源码 "-Djava.class.path=C:\hanlp\hanlp-1.2.8.jar;C:\hanlp", "-Xms1g", "-Xmx1g")
HanLP = JClass('com.hankcs.hanlp.HanLP')
# ä¸æåè¯
print(HanLP.segment('ä½ å¥½ï¼æ¬¢è¿å¨Pythonä¸è°ç¨HanLPçAPI'))
testCases = [
"åååæå¡",
"ç»å©çåå°æªç»å©çç¡®å®å¨å¹²æ°åè¯å",
"ä¹°æ°´æç¶åæ¥ä¸ååæåå»ä¸åä¼",
"ä¸å½çé¦é½æ¯å京",
"欢è¿æ°èå¸çåæ¥å°±é¤",
"工信å¤å¥³å¹²äºæ¯æç»è¿ä¸å±ç§å®¤é½è¦äº²å£äº¤ä»£å£äº¤æ¢æºçææ¯æ§å¨ä»¶çå®è£ å·¥ä½",
"éçé¡µæ¸¸å ´èµ·å°ç°å¨ç页游ç¹çï¼ä¾èµäºåæ¡£è¿è¡é»è¾å¤æç设计åå°äºï¼ä½è¿åä¹ä¸è½å®å ¨å¿½ç¥æã"]
for sentence in testCases: print(HanLP.segment(sentence))
# å½åå®ä½è¯å«ä¸è¯æ§æ 注
NLPTokenizer = JClass('com.hankcs.hanlp.tokenizer.NLPTokenizer')
print(NLPTokenizer.segment('ä¸å½ç§å¦é¢è®¡ç®ææ¯ç 究æçå®æåºæææ£å¨ææèªç¶è¯è¨å¤ç课ç¨'))
# å ³é®è¯æå
document = "æ°´å©é¨æ°´èµæºå¸å¸é¿éæå¿ 9ææ¥å¨å½å¡é¢æ°é»å举è¡çæ°é»åå¸ä¼ä¸éé²ï¼" \
"æ ¹æ®ååå®æäºæ°´èµæºç®¡çå¶åº¦çèæ ¸ï¼æé¨åçæ¥è¿äºçº¢çº¿çææ ï¼" \
"æé¨åçè¶ è¿çº¢çº¿çææ ã对ä¸äºè¶ è¿çº¢çº¿çå°æ¹ï¼éæå¿ è¡¨ç¤ºï¼å¯¹ä¸äºåç¨æ°´é¡¹ç®è¿è¡åºåçéæ¹ï¼" \
"ä¸¥æ ¼å°è¿è¡æ°´èµæºè®ºè¯åå水许å¯çæ¹åã"
print(HanLP.extractKeyword(document, 2))
# èªå¨æè¦
print(HanLP.extractSummary(document, 3))
# ä¾åå¥æ³åæ
print(HanLP.parseDependency("å¾å çè¿å ·ä½å¸®å©ä»ç¡®å®äºæç»éé¹°ãæ¾é¼ å麻éä½ä¸ºä¸»æ»ç®æ ã"))
shutdownJVM()