forked from openkylin/ukui-search
Compare commits
7 Commits
upstream
...
debian/uns
Author | SHA1 | Date |
---|---|---|
Mouse Zhang | d6cf04daaf | |
Mouse Zhang | b22b9dbac5 | |
Mouse Zhang | 1161dd3736 | |
iaom | a1a8996a6e | |
iaom | d5a0a23540 | |
Mouse Zhang | ab2aa5288c | |
Mouse Zhang | 5846705780 |
|
@ -1,3 +0,0 @@
|
||||||
[submodule "libchinese-segmentation"]
|
|
||||||
path = libchinese-segmentation
|
|
||||||
url = https://gitee.com/openkylin/chinese-segmentation.git
|
|
102
README.md
102
README.md
|
@ -3,26 +3,23 @@
|
||||||
[dWIP] UKUI Search is a user-wide desktop search feature of UKUI desktop environment.
|
[dWIP] UKUI Search is a user-wide desktop search feature of UKUI desktop environment.
|
||||||
|
|
||||||
## 简介
|
## 简介
|
||||||
狭义上的ukui-search指ukui桌面环境中的全局搜索应用,目前最新版本为4.0.x.x。全局搜索应用提供了本地文件、文本内容、应用、设置项、便签等聚合搜索功能,基于其文件索引功能,可以为用户提供快速准确的搜索体验。
|
狭义上的ukui-search指ukui桌面环境中的全局搜索应用,目前最新版本为3.22.x.x。全局搜索应用提供了本地文件、文本内容、应用、设置项、便签等聚合搜索功能,基于其文件索引功能,可以为用户提供快速准确的搜索体验。
|
||||||
|
|
||||||
广义的ukui-search除了包括全局搜索应用,还包括在ukui桌面环境中的本地搜索服务以及其开发接口。基于文建索引服务,应用搜索数据服务等基础数据源服务,可以提供基于C++接口的搜索功能,应用开发者可以通过引用动态库的形式直接使用其搜索功能。除此之外,ukui桌面环境搜索服务还提供了一组基于Qt插件框架的插件接口,用户可以通过继承接口以实现搜索功能的扩展。
|
广义的ukui-search除了包括全局搜索应用,还包括在ukui桌面环境中的本地搜索服务以及其开发接口。基于文建索引服务,应用搜索数据服务等基础数据源服务,可以提供基于C++接口的搜索功能,应用开发者可以通过引用动态库的形式直接使用其搜索功能。除此之外,ukui桌面环境搜索服务还提供了一组基于Qt插件框架的插件接口,用户可以通过继承接口以实现搜索功能的扩展。
|
||||||
以下提到的ukui-search如无说明均指后者。
|
以下提到的ukui-search如无说明均指后者。
|
||||||
|
|
||||||
ukui-search 目前被打包成9个包(openkylin):
|
ukui-search 目前被打包成6个包(openkylin):
|
||||||
+ ukui-search_xxxxxx.deb
|
+ ukui-search_xxxxxx.deb
|
||||||
+ ukui-search-service_xxxx.deb
|
|
||||||
+ libukui-search-dev_xxxxx.deb
|
+ libukui-search-dev_xxxxx.deb
|
||||||
+ libukui-search2_xxxxx.deb
|
+ libukui-search2_xxxxx.deb
|
||||||
+ libukui-search-common_xxxxx.deb
|
|
||||||
+ libchinese-segmentation1_xxxx.deb
|
+ libchinese-segmentation1_xxxx.deb
|
||||||
+ libchinese-segmentation-dev_xxxx.deb
|
+ libchinese-segmentation-dev_xxxx.deb
|
||||||
+ libchinese-segmentation-common_xxxx.deb
|
|
||||||
+ ukui-search-systemdbus_xxxxx.deb
|
+ ukui-search-systemdbus_xxxxx.deb
|
||||||
|
|
||||||
xxx代表版本号。其中,ukui-search 为全局搜索应用本体,ukui-search-service为搜索数据服务相关进程,libukui-search包提供了搜索服务基本功能以及扩展接口,libukui-search-dev为其开发包。libchinese-segmentation包为搜索服务提供了NLP能力,如中文分词等。ukui-search-systemdbus包提供了一些systemdbus提权操作。
|
xxx代表版本号。其中,ukui-search 为全局搜索应用本体,libukui-search包提供了搜索服务基本功能以及扩展接口,libukui-search-dev为其开发包。libchinese-segmentation包为搜索服务提供了NLP能力,如中文分词等。ukui-search-systemdbus包提供了一些systemdbus提权操作。
|
||||||
|
|
||||||
## 运行
|
## 运行
|
||||||
搜索服务相关的进程共有5个,包括ukui-search(全局搜索GUI界面),ukui-search-service(文件搜索服务),ukui-search-service-dir-manager(文件搜索目录管理模块), ukui-search-app-data-service(应用数据服务),ukuisearch-systemdbus(systembus)。
|
搜索服务相关的进程共有5个,包括ukui-search(全局搜索GUI界面),ukui-search-service(文件搜索服务),ukui-search-service-dir-manager(文件搜索目录管理), ukui-search-app-data-service(应用数据服务),ukuisearch-systemdbus(systembus)。
|
||||||
|
|
||||||
所有进程默认开机自启。
|
所有进程默认开机自启。
|
||||||
|
|
||||||
|
@ -73,16 +70,9 @@ interface: com.ukui.search.service
|
||||||
|
|
||||||
搜索的功能有一部分依赖于其他桌面环境组件:
|
搜索的功能有一部分依赖于其他桌面环境组件:
|
||||||
|
|
||||||
设置项搜索:依赖ukui-control-center提供的dbus接口:
|
设置项搜索:依赖ukui-control-center提供的配置文件,安装路径为:
|
||||||
|
|
||||||
```
|
> /usr/share/ukui-control-center/shell/res/search.xml
|
||||||
service:org.ukui.ukcc.session
|
|
||||||
path:/
|
|
||||||
interface:org.ukui.ukcc.session.interface
|
|
||||||
method:getSearchItems () ↦ (Dict of {String, Variant} arg_0)
|
|
||||||
signal:searchItemsAdd(Dict of{String, Variant})
|
|
||||||
searchItemsDelete(Dict of{String, Variant})
|
|
||||||
```
|
|
||||||
|
|
||||||
跳转到搜索结果对应的控制面板页面使用了ukui-control-center的命令行:
|
跳转到搜索结果对应的控制面板页面使用了ukui-control-center的命令行:
|
||||||
|
|
||||||
|
@ -101,7 +91,7 @@ Options:
|
||||||
service: com.kylin.softwarecenter.getsearchresults
|
service: com.kylin.softwarecenter.getsearchresults
|
||||||
path: /com/kylin/softwarecenter/getsearchresults
|
path: /com/kylin/softwarecenter/getsearchresults
|
||||||
interface: com.kylin.getsearchresults
|
interface: com.kylin.getsearchresults
|
||||||
method:get_search_result (String keyword) ↦ (Boolean arg_1)
|
get_search_result (String keyword) ↦ (Boolean arg_1)
|
||||||
```
|
```
|
||||||
|
|
||||||
跳转到软件商店安装页面的使用了以下dbus接口:
|
跳转到软件商店安装页面的使用了以下dbus接口:
|
||||||
|
@ -139,7 +129,7 @@ interface: org.freedesktop.FileManager1
|
||||||
|
|
||||||
## 原理与功能特点
|
## 原理与功能特点
|
||||||
|
|
||||||
全局搜索支持控制面板设置项搜索,应用搜索,文件搜索,便签本搜索。支持名称,拼音,或拼音首字母搜索(文本内容搜索和便签本搜索不支持拼音搜索)。其中,设置项搜索通过控制面板提供dbus接口获取数据,打开对应的控制面板页面也依赖与控制面板提供的命令行;应用搜索分为本地已安装应用(包括安卓兼容应用)和软件商店已上架的在线应用,在线应用的搜索和跳转安装通过软件商店提供的接口实现。所以,当怀疑搜索的设置搜索或应用搜索有问题时,可以直接测试控制面板或软件商店对应的接口。
|
全局搜索支持控制面板设置项搜索,应用搜索,文件搜索,便签本搜索。支持名称,拼音,或拼音首字母搜索(文本内容搜索和便签本搜索不支持拼音搜索)。其中,设置项搜索通过读取控制面板提供的配置文件实现,打开对应的控制面板页面也依赖与控制面板提供的命令行;应用搜索分为本地已安装应用(包括安卓兼容应用)和软件商店已上架的在线应用,在线应用的搜索和跳转安装通过软件商店提供的接口实现。所以,当怀疑搜索的设置搜索或应用搜索有问题时,可以直接测试控制面板或软件商店对应的接口。
|
||||||
|
|
||||||
文件搜索分为文件名(文件夹名)搜索和文本内容搜索。文件搜索有两种模式:`直接搜索`和`建立索引搜索`。
|
文件搜索分为文件名(文件夹名)搜索和文本内容搜索。文件搜索有两种模式:`直接搜索`和`建立索引搜索`。
|
||||||
|
|
||||||
|
@ -148,7 +138,6 @@ interface: org.freedesktop.FileManager1
|
||||||
+ 索引搜索:搜索通过遍历文件系统建立数据库(需要消耗一定的时间和资源),搜索时直接对数据库进行搜索,可以实现毫秒级的搜索响应,建立索引的过程中,搜索结果可能不全或者搜不出结果。
|
+ 索引搜索:搜索通过遍历文件系统建立数据库(需要消耗一定的时间和资源),搜索时直接对数据库进行搜索,可以实现毫秒级的搜索响应,建立索引的过程中,搜索结果可能不全或者搜不出结果。
|
||||||
首次打开索引时,ukui-search-service进程会新建两个数据库分别存储基础索引信息(用于文件名搜索)和文本内容索引信息(用于文本内容搜索),完成首次索引后,索引服务会依赖inotify机制进行实时监听更新。索引关闭再打开或重启服务时,索引服务会对遍历文件并对数据库进行校验以增量更新。
|
首次打开索引时,ukui-search-service进程会新建两个数据库分别存储基础索引信息(用于文件名搜索)和文本内容索引信息(用于文本内容搜索),完成首次索引后,索引服务会依赖inotify机制进行实时监听更新。索引关闭再打开或重启服务时,索引服务会对遍历文件并对数据库进行校验以增量更新。
|
||||||
索引数据库会基于文件系统监听进行实时更新。但是由于解析文本需要时间,所以大文件的索引新可能会有短暂的延迟。由于各种意外原因,比如索引更新过程中掉电关机,可能会导致索引损坏,此时搜索在下次开机时会重新建立索引来保证正常的文件搜索功能。基于机器配置和本地文件的数量,大小以及种类,索引重建的时间可以从几秒到数分钟不等。
|
索引数据库会基于文件系统监听进行实时更新。但是由于解析文本需要时间,所以大文件的索引新可能会有短暂的延迟。由于各种意外原因,比如索引更新过程中掉电关机,可能会导致索引损坏,此时搜索在下次开机时会重新建立索引来保证正常的文件搜索功能。基于机器配置和本地文件的数量,大小以及种类,索引重建的时间可以从几秒到数分钟不等。
|
||||||
搜索目录可以在控制面板中手动配置,目前索引已经支持外接设备。
|
|
||||||
索引搜索支持文本内容搜索,基本原理可以参考 [倒排索引与优麒麟的文件搜索](https://docs.qq.com/doc/DU0p0S1lRelp2aW1y) 。建立索引时,搜索会对常用的文本文件进行解析,提取关键词存入数据库。搜索时,用户输入的文本也会被提取关键词,和数据库中的关键词进行匹配, 所以文本索引并不能保证你搜索一个文本文件里的任意内容都能搜出这个文件,这也不是普遍的应用场景。搜索输入的文本中必须要包含【关键词】才可以。比如你搜索一个‘的’,由于‘的’并不是任何文件的关键词,所以并不会有搜索到任何文件。事实上,我们有一个停用词词库,专门用来排除‘我’‘的’于是‘等等基本上在每个文档都会出现的一些无用词。目前,搜索支持解析的文件格式有:docx,pptx, xlsx, txt(大部分编码格式), doc, dot, wps, ppt, pps, dps, et, xls, pdf,uof,uot,uos,uop,ofd以上格式均不支持加密文件的解析,此外,文件索引支持图片ocr提取文字,所以你也可以通过图片中的文字搜索到图片(就像文档一样),支持的图片格式:png,bmp,gif,tif,tiff,webp,jpe,jpg,jpeg。
|
索引搜索支持文本内容搜索,基本原理可以参考 [倒排索引与优麒麟的文件搜索](https://docs.qq.com/doc/DU0p0S1lRelp2aW1y) 。建立索引时,搜索会对常用的文本文件进行解析,提取关键词存入数据库。搜索时,用户输入的文本也会被提取关键词,和数据库中的关键词进行匹配, 所以文本索引并不能保证你搜索一个文本文件里的任意内容都能搜出这个文件,这也不是普遍的应用场景。搜索输入的文本中必须要包含【关键词】才可以。比如你搜索一个‘的’,由于‘的’并不是任何文件的关键词,所以并不会有搜索到任何文件。事实上,我们有一个停用词词库,专门用来排除‘我’‘的’于是‘等等基本上在每个文档都会出现的一些无用词。目前,搜索支持解析的文件格式有:docx,pptx, xlsx, txt(大部分编码格式), doc, dot, wps, ppt, pps, dps, et, xls, pdf,uof,uot,uos,uop,ofd以上格式均不支持加密文件的解析,此外,文件索引支持图片ocr提取文字,所以你也可以通过图片中的文字搜索到图片(就像文档一样),支持的图片格式:png,bmp,gif,tif,tiff,webp,jpe,jpg,jpeg。
|
||||||
|
|
||||||
> 注意:应用的.desktop文件并不是应用本身或者“快捷方式”,对于搜索来说它只是一个文件,所以搜索desktop文件的名字并不能搜出这个应用,除非它恰好和应用重名。另外,在文件搜索中显示的dekstop文件并不会以应用的形式显示,而是显示它本来的样子——一个文件。
|
> 注意:应用的.desktop文件并不是应用本身或者“快捷方式”,对于搜索来说它只是一个文件,所以搜索desktop文件的名字并不能搜出这个应用,除非它恰好和应用重名。另外,在文件搜索中显示的dekstop文件并不会以应用的形式显示,而是显示它本来的样子——一个文件。
|
||||||
|
@ -161,13 +150,11 @@ ukui-search应用和ukui-search-service、ukui-search-app-data-service的配置
|
||||||
|
|
||||||
文件说明:
|
文件说明:
|
||||||
|
|
||||||
+ ukui-search.conf -------------------------------------全局搜索GUI配置文件。
|
+ ukui-search.conf ------------------------------------全局搜索GUI配置文件。
|
||||||
+ ukui-search-plugin-order.conf -------------------搜索插件显示顺序
|
+ ukui-search-block-dirs.conf ---------------------文件搜索黑名单,在控制面板中设置
|
||||||
+ ukui-search-block-dirs.conf ----------------------文件搜索黑名单,在控制面板中设置
|
+ ukui-search-index-status.conf ------------------文件索引服务状态记录
|
||||||
+ ukui-search-index-status.conf -------------------文件索引服务状态记录
|
+ index_data ---------------------------------------------文件索引数据库
|
||||||
+ ukui-search-current-indexable-dir.conf -------搜索目录配置文件
|
+ content_index_data ---------------------------------文本内容数据库
|
||||||
+ index_data --------------------------------------------文件索引数据库
|
|
||||||
+ content_index_data --------------------------------文本内容数据库
|
|
||||||
|
|
||||||
## 编译
|
## 编译
|
||||||
|
|
||||||
|
@ -194,11 +181,11 @@ mkdir build;cd build;qmake ..;make
|
||||||
|
|
||||||
## 调试
|
## 调试
|
||||||
|
|
||||||
ukui-search目前并未采用ukui-log4qt模块的日志功能。如需调试,可在~/.config/org.ukui/目录新建`ukui-search.log`、`ukui-search-service.log`以及`ukui-search-app-data-service.log`文件,分别对应全局搜索GUI应用,全局搜索文件索引服务和应用数据服务。新建日志文件后,日志会自动打印到对应文件中,但目前日志没有自动备份或删除机制。
|
ukui-search目前并未采用ukui-log4qt模块的日志功能。如需调试,可在以下目录新建`ukui-search.log`、`ukui-search-service.log`以及`ukui-search-app-data-service.log`文件,分别对应全局搜索GUI应用,全局搜索文件索引服务和应用数据服务。新建日志文件后,日志会自动打印到对应额文件中,但目前日志没有自动备份或删除机制。
|
||||||
|
|
||||||
## 开发接口
|
## 开发接口
|
||||||
|
|
||||||
### 搜索服务接口(此接口目前快速更新,请以代码为准)
|
### 搜索服务接口(此接口目前处于快速更新总,请以代码为准)
|
||||||
|
|
||||||
#### Use with CMake:
|
#### Use with CMake:
|
||||||
|
|
||||||
|
@ -223,33 +210,20 @@ PKGCONFIG += ukui-search
|
||||||
......
|
......
|
||||||
//初始化一个搜索实例
|
//初始化一个搜索实例
|
||||||
UkuiSearch::UkuiSearchTask ukst;
|
UkuiSearch::UkuiSearchTask ukst;
|
||||||
//初始化需要用到的搜索插件
|
|
||||||
ukst.initSearchPlugin(UkuiSearch::SearchProperty::SearchType::File);
|
|
||||||
//初始化队列
|
//初始化队列
|
||||||
UkuiSearch::DataQueue<UkuiSearch::ResultItem> *queue = ukst.init();
|
UkuiSearch::DataQueue<UkuiSearch::ResultItem> *queue = ukst.init();
|
||||||
//设置最大结果数量(默认为100)
|
//加载想要使用的搜索插件
|
||||||
ukst.setMaxResultNum(999999);
|
ukst.initSearchPlugin(UkuiSearch::SearchType::File);
|
||||||
//添加搜索文件夹
|
|
||||||
QString path = "/home/usr/下载";
|
|
||||||
ukst.addSearchDir("path");
|
|
||||||
//设置需要的信息,将被储存在 UkuiSearch::ResultItem中
|
|
||||||
ukst.setResultProperties(UkuiSearch::SearchProperty::SearchType::File,
|
|
||||||
UkuiSearch::SearchResultProperties{UkuiSearch::SearchProperty::FilePath,
|
|
||||||
UkuiSearch::SearchProperty::FileIconName});
|
|
||||||
//添加关键词,支持添加多个关键词,用 ‘与’的关系搜索,注意,当需要重新添加关键词时需要调用‘clearKeyWords清空关键词’
|
|
||||||
ukst.addKeyword(searchText);
|
|
||||||
//添加搜索条件
|
//添加搜索条件
|
||||||
ukst.setOnlySearchFile(true);
|
ukst.setOnlySearchFile(true);
|
||||||
//执行搜索,参数表示执行搜索的搜索插件,注意每次搜索之前可以调用‘’
|
ukst.addKeyword(m_keyword);
|
||||||
ukst.startSearch(UkuiSearch::SearchProperty::SearchType::File);
|
//启动搜索(异步)
|
||||||
|
ukst.startSearch(UkuiSearch::SearchType::File);
|
||||||
//接收结果(示例)
|
//接收结果(示例)
|
||||||
while(!queue->isEmpty()) {
|
while(true) {
|
||||||
auto result = queue->dequeue();
|
if(!queue->isEmpty()) {
|
||||||
//通过属性取值
|
qDebug() << queue->dequeue().getItemKey();
|
||||||
qDebug() << result.getValue(UkuiSearch::SearchProperty::FilePath);
|
}
|
||||||
//直接获取所有值
|
|
||||||
UkuiSearch::SearchResultPropertyMap map = result.getAllValue();
|
|
||||||
qDebug() << map;
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -293,30 +267,24 @@ Q_DECLARE_INTERFACE(UkuiSearch::SearchTaskPluginIface, SearchTaskPluginIface_iid
|
||||||
表示加载用户插件
|
表示加载用户插件
|
||||||
|
|
||||||
```c++
|
```c++
|
||||||
ukst.initSearchPlugin(UkuiSearch::SearchType::Custom, "<用户自定义的名称>");
|
ukst.initSearchPlugin(UkuiSearch::SearchType::Custom);
|
||||||
```
|
```
|
||||||
|
|
||||||
启动搜索
|
启动搜索
|
||||||
|
|
||||||
```c++
|
```c++
|
||||||
ukst.startSearch(UkuiSearch::SearchType::Custom, "<用户自定义的名称>";
|
ukst.startSearch(UkuiSearch::SearchType::<用户自定义的名称>);
|
||||||
```
|
```
|
||||||
|
|
||||||
### 搜索应用插件接口
|
### 搜索应用插件接口
|
||||||
|
|
||||||
搜索应用本身也提供了一个插件接口,可以通过加载用户实现的插件以实现额外搜索以及详情页定制功能:
|
搜索应用本身也提供了一个插件接口,可以通过加载用户实现的插件以实现额外搜索功能:
|
||||||
|
|
||||||
```c++
|
```c++
|
||||||
|
namespace UkuiSearch {
|
||||||
class SearchPluginIface : public PluginInterface
|
class SearchPluginIface : public PluginInterface
|
||||||
{
|
{
|
||||||
public:
|
public:
|
||||||
enum InvokableAction
|
|
||||||
{
|
|
||||||
None = 1u << 0,
|
|
||||||
HideUI = 1u << 1
|
|
||||||
};
|
|
||||||
Q_DECLARE_FLAGS(InvokableActions, InvokableAction)
|
|
||||||
|
|
||||||
struct DescriptionInfo
|
struct DescriptionInfo
|
||||||
{
|
{
|
||||||
QString key;
|
QString key;
|
||||||
|
@ -337,15 +305,6 @@ public:
|
||||||
QVector<DescriptionInfo> description;
|
QVector<DescriptionInfo> description;
|
||||||
QString actionKey;
|
QString actionKey;
|
||||||
int type;
|
int type;
|
||||||
ResultInfo(const QIcon &iconToSet = QIcon(), const QString &nameToSet = QString(),
|
|
||||||
const QVector<DescriptionInfo> &descriptionToSet = QVector<DescriptionInfo>(),
|
|
||||||
const QString &actionKeyToSet = QString(), const int &typeToSet = 0) {
|
|
||||||
icon = iconToSet;
|
|
||||||
name = nameToSet;
|
|
||||||
description = descriptionToSet;
|
|
||||||
actionKey = actionKeyToSet;
|
|
||||||
type = typeToSet;
|
|
||||||
}
|
|
||||||
};
|
};
|
||||||
|
|
||||||
virtual ~SearchPluginIface() {}
|
virtual ~SearchPluginIface() {}
|
||||||
|
@ -354,12 +313,9 @@ public:
|
||||||
virtual void stopSearch() = 0;
|
virtual void stopSearch() = 0;
|
||||||
virtual QList<Actioninfo> getActioninfo(int type) = 0;
|
virtual QList<Actioninfo> getActioninfo(int type) = 0;
|
||||||
virtual void openAction(int actionkey, QString key, int type) = 0;
|
virtual void openAction(int actionkey, QString key, int type) = 0;
|
||||||
// virtual bool isPreviewEnable(QString key, int type) = 0;
|
|
||||||
// virtual QWidget *previewPage(QString key, int type, QWidget *parent = nullptr) = 0;
|
|
||||||
virtual QWidget *detailPage(const ResultInfo &ri) = 0;
|
virtual QWidget *detailPage(const ResultInfo &ri) = 0;
|
||||||
|
|
||||||
void invokeActions(InvokableActions actions);
|
|
||||||
};
|
};
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
> 接口使用注意事项:
|
> 接口使用注意事项:
|
||||||
|
|
|
@ -0,0 +1,5 @@
|
||||||
|
ukui-search (3.22.4.2) unstable; urgency=medium
|
||||||
|
|
||||||
|
* Initial
|
||||||
|
|
||||||
|
-- MouseZhang <sendbypython@foxmail.com> Fri, 10 Feb 2023 11:26:15 +0800
|
|
@ -0,0 +1,96 @@
|
||||||
|
Source: ukui-search
|
||||||
|
Section: utils
|
||||||
|
Priority: optional
|
||||||
|
Maintainer: Kylin Team <team+kylin@tracker.debian.org>
|
||||||
|
Uploaders: MouseZhang <sendbypython@foxmail.com>
|
||||||
|
Build-Depends: debhelper-compat (=13),
|
||||||
|
pkgconf,
|
||||||
|
libgsettings-qt-dev,
|
||||||
|
qtbase5-dev,
|
||||||
|
qt5-qmake,
|
||||||
|
qtchooser,
|
||||||
|
qtscript5-dev,
|
||||||
|
qttools5-dev-tools,
|
||||||
|
libxapian-dev,
|
||||||
|
libquazip5-dev(>=0.7.6-6build1),
|
||||||
|
libglib2.0-dev,
|
||||||
|
libkf5windowsystem-dev,
|
||||||
|
libqt5x11extras5-dev,
|
||||||
|
libuchardet-dev,
|
||||||
|
libpoppler-qt5-dev,
|
||||||
|
libukui-log4qt-dev,
|
||||||
|
libqt5xdg-dev,
|
||||||
|
libukcc-dev,
|
||||||
|
libopencv-dev,
|
||||||
|
libtesseract-dev,
|
||||||
|
libkysdk-waylandhelper-dev,
|
||||||
|
libkysdk-qtwidgets-dev,
|
||||||
|
libukui-appwidget-manager-dev,
|
||||||
|
libukui-appwidget-provider-dev,
|
||||||
|
libukui-appwidget-qmlplugin0,
|
||||||
|
qml-module-org-ukui-stylehelper,
|
||||||
|
qtdeclarative5-dev
|
||||||
|
Standards-Version: 4.6.1.0
|
||||||
|
Rules-Requires-Root: no
|
||||||
|
Homepage: https://www.ukui.org/
|
||||||
|
Vcs-Git: https://gitee.com/openkylin/ukui-search.git
|
||||||
|
Vcs-Browser: https://gitee.com/openkylin/ukui-search
|
||||||
|
|
||||||
|
Package: ukui-search
|
||||||
|
Architecture: any
|
||||||
|
Depends: ${misc:Depends},
|
||||||
|
${shlibs:Depends},
|
||||||
|
libukui-search2 (= ${binary:Version}),
|
||||||
|
Description: User-wide desktop search feature of UKUI desktop environment
|
||||||
|
Gui application that provides file search,
|
||||||
|
application search,settings search functions,
|
||||||
|
and so on.
|
||||||
|
|
||||||
|
Package: libchinese-segmentation1
|
||||||
|
Section: libs
|
||||||
|
Architecture: any
|
||||||
|
Depends: ${misc:Depends},
|
||||||
|
${shlibs:Depends}
|
||||||
|
Description: Libraries for chinese-segmentation
|
||||||
|
This package contains a few runtime libraries needed by
|
||||||
|
libsearch.
|
||||||
|
|
||||||
|
Package: libchinese-segmentation-dev
|
||||||
|
Section: libdevel
|
||||||
|
Architecture: any
|
||||||
|
Depends: ${misc:Depends},
|
||||||
|
${shlibs:Depends},
|
||||||
|
libchinese-segmentation1 (= ${binary:Version}),
|
||||||
|
Description: Libraries for chinese-segmentation(development files)
|
||||||
|
This package contains NLP functions used by ukui-search.
|
||||||
|
|
||||||
|
Package: libukui-search2
|
||||||
|
Section: libs
|
||||||
|
Architecture: any
|
||||||
|
Depends: ${misc:Depends},
|
||||||
|
${shlibs:Depends},
|
||||||
|
libchinese-segmentation1 (= ${binary:Version}),
|
||||||
|
ukui-search-systemdbus (= ${binary:Version})
|
||||||
|
Provides: libukui-search,
|
||||||
|
Description: Libraries for ukui-search
|
||||||
|
This package provides libraries for ukui-search,
|
||||||
|
and contains some binarys for search function implement,
|
||||||
|
Which are ukui-search-service,ukui-search-app-data-service
|
||||||
|
and ukui-search-service-dir-manager.
|
||||||
|
|
||||||
|
Package: libukui-search-dev
|
||||||
|
Section: libdevel
|
||||||
|
Architecture: any
|
||||||
|
Depends: ${misc:Depends},
|
||||||
|
${shlibs:Depends},
|
||||||
|
libukui-search2 (= ${binary:Version}),
|
||||||
|
Description: Libraries for ukui-search(development files)
|
||||||
|
This package can be used to implement a gui application.
|
||||||
|
|
||||||
|
Package: ukui-search-systemdbus
|
||||||
|
Architecture: any
|
||||||
|
Depends: ${shlibs:Depends},
|
||||||
|
${misc:Depends},
|
||||||
|
Description: Systembus interface to modify max_user_watches nums permanent
|
||||||
|
This package contains functions used when ukui-search want to
|
||||||
|
modify some system settings.
|
|
@ -0,0 +1,267 @@
|
||||||
|
Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
|
||||||
|
|
||||||
|
Files: *
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: 3rd-parties/*
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
2015-2018, Itay Grudev
|
||||||
|
License: Expat
|
||||||
|
|
||||||
|
Files: 3rd-parties/SingleApplication/CHANGELOG.md
|
||||||
|
3rd-parties/SingleApplication/Windows.md
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: 3rd-parties/SingleApplication/README.md
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: Expat
|
||||||
|
|
||||||
|
Files: 3rd-parties/SingleApplication/singleapplication.cpp
|
||||||
|
3rd-parties/SingleApplication/singleapplication.h
|
||||||
|
3rd-parties/SingleApplication/singleapplication_p.cpp
|
||||||
|
3rd-parties/SingleApplication/singleapplication_p.h
|
||||||
|
Copyright: 2015-2018, Itay Grudev
|
||||||
|
License: Expat
|
||||||
|
|
||||||
|
Files: 3rd-parties/qtsingleapplication/*
|
||||||
|
Copyright: 2013, Digia Plc and/or its subsidiary(-ies).
|
||||||
|
License: BSD-3-clause
|
||||||
|
|
||||||
|
Files: 3rd-parties/qtsingleapplication/QtLockedFile
|
||||||
|
3rd-parties/qtsingleapplication/QtSingleApplication
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: frontend/*
|
||||||
|
Copyright: 2020-2022, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: frontend/control/flow-layout/*
|
||||||
|
Copyright: 2019, Tianjin KYLIN Information Technology Co., Ltd.
|
||||||
|
License: GPL-2+
|
||||||
|
|
||||||
|
Files: frontend/model/best-list-model.h
|
||||||
|
frontend/model/web-search-model.h
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: frontend/ukui-search-dbus-service.cpp
|
||||||
|
frontend/ukui-search-dbus-service.h
|
||||||
|
frontend/ukui-search-gui.cpp
|
||||||
|
frontend/ukui-search-gui.h
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: frontend/view/*
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: frontend/view/result-view-delegate.h
|
||||||
|
frontend/view/web-search-view.cpp
|
||||||
|
Copyright: 2020-2022, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libchinese-segmentation/*
|
||||||
|
Copyright: 2020-2022, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libchinese-segmentation/chinese-segmentation-private.h
|
||||||
|
libchinese-segmentation/common-struct.h
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libchinese-segmentation/cppjieba/*
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libchinese-segmentation/cppjieba/idf-trie/*
|
||||||
|
Copyright: 2020-2022, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libchinese-segmentation/cppjieba/limonp/Md5.hpp
|
||||||
|
Copyright: 1991, 1992, RSA Data Security, Inc. Created 1991
|
||||||
|
License: NTP
|
||||||
|
|
||||||
|
Files: libchinese-segmentation/cppjieba/segment-trie/*
|
||||||
|
Copyright: 2020-2022, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libchinese-segmentation/development-files/*
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libchinese-segmentation/dict/*
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libchinese-segmentation/storage-base/cedar/*
|
||||||
|
Copyright: 2009-2015, Naoki Yoshinaga <ynaga@tkl.iis.u-tokyo.ac.jp>
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libchinese-segmentation/storage-base/darts-clone/*
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libchinese-segmentation/test/*
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libsearch/appsearch/app-match.cpp
|
||||||
|
libsearch/appsearch/app-match.h
|
||||||
|
Copyright: 2020-2022, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libsearch/file-utils.cpp
|
||||||
|
libsearch/file-utils.h
|
||||||
|
libsearch/global-settings.cpp
|
||||||
|
libsearch/global-settings.h
|
||||||
|
libsearch/gobject-template.cpp
|
||||||
|
libsearch/gobject-template.h
|
||||||
|
libsearch/libsearch.cpp
|
||||||
|
libsearch/libsearch.h
|
||||||
|
libsearch/libsearch_global.h
|
||||||
|
Copyright: 2020-2022, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libsearch/filesystemwatcher/*
|
||||||
|
Copyright: 2020-2022, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libsearch/index/*
|
||||||
|
Copyright: 2020-2022, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libsearch/index/compatible-define.h
|
||||||
|
libsearch/index/data-queue.cpp
|
||||||
|
libsearch/index/data-queue.h
|
||||||
|
libsearch/index/database.cpp
|
||||||
|
libsearch/index/database.h
|
||||||
|
libsearch/index/ocrobject.cpp
|
||||||
|
libsearch/index/ocrobject.h
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: libsearch/plugininterface/action-label.cpp
|
||||||
|
libsearch/plugininterface/action-label.h
|
||||||
|
Copyright: 2020-2022, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: ukui-search-app-data-service/convert-winid-to-desktop.cpp
|
||||||
|
ukui-search-app-data-service/convert-winid-to-desktop.h
|
||||||
|
Copyright: 2019, Tianjin KYLIN Information Technology Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: ukui-search-service-dir-manager/dirwatcher/dir-watcher-adaptor.cpp
|
||||||
|
ukui-search-service-dir-manager/dirwatcher/dir-watcher-adaptor.h
|
||||||
|
Copyright: 2020, The Qt Company Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: ukui-search-service-dir-manager/main.cpp
|
||||||
|
Copyright: 2019, Tianjin KYLIN Information Technology Co., Ltd.
|
||||||
|
License: GPL-2+
|
||||||
|
|
||||||
|
Files: ukui-search-service/*
|
||||||
|
Copyright: 2020-2022, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: ukui-search-service/ukui-search-service.h
|
||||||
|
Copyright: 2020, KylinSoft Co., Ltd.
|
||||||
|
License: GPL-3+
|
||||||
|
|
||||||
|
Files: ukuisearch-systemdbus/*
|
||||||
|
Copyright: 2019, Tianjin KYLIN Information Technology Co., Ltd.
|
||||||
|
License: GPL-2+
|
||||||
|
|
||||||
|
License: BSD-3-clause
|
||||||
|
This software is Copyright (c) 2021 by foo.
|
||||||
|
This is free software, licensed under:
|
||||||
|
The (three-clause) BSD License
|
||||||
|
The BSD License
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
* Neither the name of foo nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
|
||||||
|
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
|
||||||
|
License: Expat
|
||||||
|
The MIT License
|
||||||
|
.
|
||||||
|
Permission is hereby granted, free of charge, to any person
|
||||||
|
obtaining a copy of this software and associated
|
||||||
|
documentation files (the "Software"), to deal in the Software
|
||||||
|
without restriction, including without limitation the rights to
|
||||||
|
use, copy, modify, merge, publish, distribute, sublicense,
|
||||||
|
and/or sell copies of the Software, and to permit persons to
|
||||||
|
whom the Software is furnished to do so, subject to the
|
||||||
|
following conditions:
|
||||||
|
.
|
||||||
|
The above copyright notice and this permission notice shall
|
||||||
|
be included in all copies or substantial portions of the
|
||||||
|
Software.
|
||||||
|
.
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT
|
||||||
|
WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
|
||||||
|
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||||
|
MERCHANTABILITY, FITNESS FOR A PARTICULAR
|
||||||
|
PURPOSE AND NONINFRINGEMENT. IN NO EVENT
|
||||||
|
SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
||||||
|
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
||||||
|
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
||||||
|
CONNECTION WITH THE SOFTWARE OR THE USE OR
|
||||||
|
OTHER DEALINGS IN THE SOFTWARE.
|
||||||
|
|
||||||
|
License: GPL-2+
|
||||||
|
This software is Copyright (c) 2021 by foo.
|
||||||
|
This is free software, licensed under:
|
||||||
|
The GNU General Public License, Version 2, June 1991
|
||||||
|
This program is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; version 2 dated June, 1991, or (at
|
||||||
|
your option) any later version.
|
||||||
|
On Debian systems, the complete text of version 2 of the GNU General
|
||||||
|
Public License can be found in '/usr/share/common-licenses/GPL-2'.
|
||||||
|
|
||||||
|
License: GPL-3+
|
||||||
|
This software is Copyright (c) 2021 by foo.
|
||||||
|
This is free software, licensed under:
|
||||||
|
The GNU General Public License, Version 3, June 2007
|
||||||
|
This program is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; version 3 dated June, 2007, or (at
|
||||||
|
your option) any later version.
|
||||||
|
On Debian systems, the complete text of version 3 of the GNU General
|
||||||
|
Public License can be found in '/usr/share/common-licenses/GPL-3'.
|
||||||
|
|
||||||
|
License: NTP
|
||||||
|
Copyright (c) (CopyrightHoldersName) (From 4-digit-year)-(To 4-digit-year)
|
||||||
|
Permission to use, copy, modify, and distribute this software and
|
||||||
|
its documentation for any purpose with or without fee is hereby
|
||||||
|
granted, provided that the above copyright notice appears in all
|
||||||
|
copies and that both the copyright notice and this permission
|
||||||
|
notice appear in supporting documentation, and that the name
|
||||||
|
(TrademarkedName) not be used in advertising or publicity
|
||||||
|
pertaining to distribution of the software without specific,
|
||||||
|
written prior permission. (TrademarkedName) makes no
|
||||||
|
representations about the suitability this software for any
|
||||||
|
purpose. It is provided “as is” without express or implied
|
||||||
|
warranty.
|
|
@ -0,0 +1,3 @@
|
||||||
|
usr/include/chinese-seg/*
|
||||||
|
usr/lib/*/pkgconfig/chinese-segmentation.pc
|
||||||
|
usr/lib/*/libchinese-segmentation.so
|
|
@ -0,0 +1,3 @@
|
||||||
|
usr/lib/*/libchinese-segmentation.so.*
|
||||||
|
/usr/share/ukui-search/res/dict/*.utf8
|
||||||
|
/usr/share/ukui-search/res/dict/*.txt
|
|
@ -0,0 +1,3 @@
|
||||||
|
usr/include/ukui-search/*
|
||||||
|
usr/lib/*/pkgconfig/ukui-search.pc
|
||||||
|
usr/lib/*/libukui-search.so
|
|
@ -0,0 +1,11 @@
|
||||||
|
usr/lib/*/libukui-search.so.*
|
||||||
|
usr/bin/ukui-search-service
|
||||||
|
usr/bin/ukui-search-app-data-service
|
||||||
|
usr/bin/ukui-search-service-dir-manager
|
||||||
|
etc/xdg/autostart/ukui-search-service-dir-manager.desktop
|
||||||
|
etc/xdg/autostart/ukui-search-app-data-service.desktop
|
||||||
|
etc/xdg/autostart/ukui-search-service.desktop
|
||||||
|
usr/share/dbus-1/services/com.ukui.search.appdb.service
|
||||||
|
usr/share/dbus-1/services/com.ukui.search.fileindex.service
|
||||||
|
usr/share/glib-2.0/schemas/org.ukui.search.data.gschema.xml
|
||||||
|
libsearch/.qm/*.qm usr/share/ukui-search/translations
|
|
@ -0,0 +1,5 @@
|
||||||
|
#!/usr/bin/make -f
|
||||||
|
export DEB_BUILD_MAINT_OPTIONS = hardening=+all
|
||||||
|
|
||||||
|
%:
|
||||||
|
dh $@
|
|
@ -0,0 +1 @@
|
||||||
|
3.0 (native)
|
|
@ -0,0 +1,3 @@
|
||||||
|
/usr/share/dbus-1/system-services/com.ukui.search.qt.systemdbus.service
|
||||||
|
/usr/share/dbus-1/system.d/com.ukui.search.qt.systemdbus.conf
|
||||||
|
/usr/bin/ukui-search-systemdbus
|
|
@ -0,0 +1,12 @@
|
||||||
|
usr/bin/ukui-search
|
||||||
|
etc/xdg/autostart/ukui-search.desktop
|
||||||
|
usr/share/applications/ukui-search-menu.desktop
|
||||||
|
frontend/.qm/*.qm usr/share/ukui-search/translations
|
||||||
|
usr/share/glib-2.0/schemas/org.ukui.log4qt.ukui-search.gschema.xml
|
||||||
|
usr/lib/*/ukui-control-center/*
|
||||||
|
usr/share/ukui-search/search-ukcc-plugin/translations/*
|
||||||
|
search-ukcc-plugin/.qm/*.qm usr/share/ukui-search/search-ukcc-plugin/translations
|
||||||
|
usr/share/ukui-search/search-ukcc-plugin/image/*
|
||||||
|
|
||||||
|
usr/share/dbus-1/services/org.ukui.appwidget.provider.search.service
|
||||||
|
/usr/share/appwidget/*
|
|
@ -20,7 +20,6 @@
|
||||||
*/
|
*/
|
||||||
#include "search-line-edit.h"
|
#include "search-line-edit.h"
|
||||||
#include <KWindowEffects>
|
#include <KWindowEffects>
|
||||||
#include <QApplication>
|
|
||||||
#include <QPainterPath>
|
#include <QPainterPath>
|
||||||
|
|
||||||
QT_BEGIN_NAMESPACE
|
QT_BEGIN_NAMESPACE
|
||||||
|
@ -92,7 +91,7 @@ void SearchLineEdit::paintEvent(QPaintEvent *e)
|
||||||
QPainter p(this);
|
QPainter p(this);
|
||||||
p.setRenderHint(QPainter::Antialiasing); // 反锯齿;
|
p.setRenderHint(QPainter::Antialiasing); // 反锯齿;
|
||||||
p.setBrush(palette().base());
|
p.setBrush(palette().base());
|
||||||
p.setOpacity(GlobalSettings::getInstance().getValue(TRANSPARENCY_KEY).toDouble());
|
p.setOpacity(GlobalSettings::getInstance()->getValue(TRANSPARENCY_KEY).toDouble());
|
||||||
p.setPen(Qt::NoPen);
|
p.setPen(Qt::NoPen);
|
||||||
p.drawRoundedRect(this->rect(), 12, 12);
|
p.drawRoundedRect(this->rect(), 12, 12);
|
||||||
return QLineEdit::paintEvent(e);
|
return QLineEdit::paintEvent(e);
|
||||||
|
@ -100,7 +99,6 @@ void SearchLineEdit::paintEvent(QPaintEvent *e)
|
||||||
|
|
||||||
void SearchLineEdit::focusOutEvent(QFocusEvent *e)
|
void SearchLineEdit::focusOutEvent(QFocusEvent *e)
|
||||||
{
|
{
|
||||||
Q_UNUSED(e)
|
|
||||||
this->setFocus();
|
this->setFocus();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -630,6 +630,49 @@ QString escapeHtml(const QString & str) {
|
||||||
return temp;
|
return temp;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
void DetailWidget::setWidgetInfo(const QString &plugin_name, const SearchPluginIface::ResultInfo &info)
|
||||||
|
{
|
||||||
|
// clearLayout(m_descFrameLyt);
|
||||||
|
// clearLayout(m_previewFrameLyt);
|
||||||
|
// if(SearchPluginManager::getInstance()->getPlugin(plugin_name)->isPreviewEnable(info.actionKey,info.type)) {
|
||||||
|
// m_iconLabel->hide();
|
||||||
|
// m_previewFrameLyt->addWidget(SearchPluginManager::getInstance()->getPlugin(plugin_name)->previewPage(info.actionKey,info.type, m_previewFrame), 0 , Qt::AlignHCenter);
|
||||||
|
// m_previewFrameLyt->setContentsMargins(0,0,0,0);
|
||||||
|
// m_previewFrame->show();
|
||||||
|
// } else {
|
||||||
|
// m_previewFrame->hide();
|
||||||
|
// m_iconLabel->setPixmap(info.icon.pixmap(info.icon.actualSize(ICON_SIZE)));
|
||||||
|
// m_iconLabel->show();
|
||||||
|
// }
|
||||||
|
// QFontMetrics fontMetrics = m_nameLabel->fontMetrics();
|
||||||
|
// QString name = fontMetrics.elidedText(info.name, Qt::ElideRight, NAME_LABEL_WIDTH - 8);
|
||||||
|
// m_nameLabel->setText(QString("<h3 style=\"font-weight:normal;\">%1</h3>").arg(escapeHtml(name)));
|
||||||
|
// m_nameLabel->setToolTip(info.name);
|
||||||
|
// m_pluginLabel->setText(plugin_name);
|
||||||
|
// m_nameFrame->show();
|
||||||
|
// m_line_1->show();
|
||||||
|
|
||||||
|
// if (info.description.length() > 0) {
|
||||||
|
// //NEW_TODO 样式待优化
|
||||||
|
// clearLayout(m_descFrameLyt);
|
||||||
|
// Q_FOREACH (SearchPluginIface::DescriptionInfo desc, info.description) {
|
||||||
|
// QLabel * descLabel = new QLabel(m_descFrame);
|
||||||
|
// descLabel->setTextFormat(Qt::PlainText);
|
||||||
|
// descLabel->setWordWrap(true);
|
||||||
|
// QString show_desc = desc.key + " " + desc.value;
|
||||||
|
// descLabel->setText(show_desc);
|
||||||
|
// m_descFrameLyt->addWidget(descLabel);
|
||||||
|
// }
|
||||||
|
// m_descFrame->show();
|
||||||
|
// m_line_2->show();
|
||||||
|
// }
|
||||||
|
// clearLayout(m_actionFrameLyt);
|
||||||
|
// Q_FOREACH (SearchPluginIface::Actioninfo actioninfo, SearchPluginManager::getInstance()->getPlugin(plugin_name)->getActioninfo(info.type)) {
|
||||||
|
// ActionLabel * actionLabel = new ActionLabel(actioninfo.displayName, info.actionKey, actioninfo.actionkey, plugin_name, info.type, m_actionFrame);
|
||||||
|
// m_actionFrameLyt->addWidget(actionLabel);
|
||||||
|
// }
|
||||||
|
// m_actionFrame->show();
|
||||||
|
}
|
||||||
|
|
||||||
void DetailWidget::updateDetailPage(const QString &plugin_name, const SearchPluginIface::ResultInfo &info)
|
void DetailWidget::updateDetailPage(const QString &plugin_name, const SearchPluginIface::ResultInfo &info)
|
||||||
{
|
{
|
||||||
|
@ -654,6 +697,71 @@ void DetailWidget::updateDetailPage(const QString &plugin_name, const SearchPlug
|
||||||
m_currentPluginId = plugin_name;
|
m_currentPluginId = plugin_name;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
void DetailWidget::clear()
|
||||||
|
{
|
||||||
|
// m_iconLabel->hide();
|
||||||
|
// m_nameFrame->hide();
|
||||||
|
// m_line_1->hide();
|
||||||
|
// m_descFrame->hide();
|
||||||
|
// m_line_2->hide();
|
||||||
|
// m_actionFrame->hide();
|
||||||
|
}
|
||||||
|
|
||||||
|
void DetailWidget::initUi()
|
||||||
|
{
|
||||||
|
// this->setFixedSize(368, 516);
|
||||||
|
// m_mainLyt = new QVBoxLayout(this);
|
||||||
|
// this->setLayout(m_mainLyt);
|
||||||
|
// m_mainLyt->setContentsMargins(DETAIL_WIDGET_MARGINS);
|
||||||
|
// m_mainLyt->setAlignment(Qt::AlignHCenter);
|
||||||
|
|
||||||
|
// m_iconLabel = new QLabel(this);
|
||||||
|
// m_iconLabel->setFixedHeight(DETAIL_ICON_HEIGHT);
|
||||||
|
// m_iconLabel->setAlignment(Qt::AlignCenter);
|
||||||
|
// m_previewFrame = new QFrame(this);
|
||||||
|
// m_previewFrameLyt = new QHBoxLayout(m_previewFrame);
|
||||||
|
|
||||||
|
// m_nameFrame = new QFrame(this);
|
||||||
|
// m_nameFrameLyt = new QHBoxLayout(m_nameFrame);
|
||||||
|
// m_nameFrame->setLayout(m_nameFrameLyt);
|
||||||
|
// m_nameFrameLyt->setContentsMargins(DETAIL_FRAME_MARGINS);
|
||||||
|
// m_nameLabel = new QLabel(m_nameFrame);
|
||||||
|
// m_nameLabel->setMaximumWidth(NAME_LABEL_WIDTH);
|
||||||
|
// m_pluginLabel = new QLabel(m_nameFrame);
|
||||||
|
// m_pluginLabel->setEnabled(false);
|
||||||
|
// m_nameFrameLyt->addWidget(m_nameLabel);
|
||||||
|
// m_nameFrameLyt->addStretch();
|
||||||
|
// m_nameFrameLyt->addWidget(m_pluginLabel);
|
||||||
|
|
||||||
|
// m_line_1 = new QFrame(this);
|
||||||
|
// m_line_1->setFixedHeight(1);
|
||||||
|
// m_line_1->setLineWidth(0);
|
||||||
|
// m_line_1->setStyleSheet(LINE_STYLE);
|
||||||
|
// m_line_2 = new QFrame(this);
|
||||||
|
// m_line_2->setFixedHeight(1);
|
||||||
|
// m_line_2->setLineWidth(0);
|
||||||
|
// m_line_2->setStyleSheet(LINE_STYLE);
|
||||||
|
|
||||||
|
// m_descFrame = new QFrame(this);
|
||||||
|
// m_descFrameLyt = new QVBoxLayout(m_descFrame);
|
||||||
|
// m_descFrame->setLayout(m_descFrameLyt);
|
||||||
|
// m_descFrameLyt->setContentsMargins(DETAIL_FRAME_MARGINS);
|
||||||
|
|
||||||
|
// m_actionFrame = new QFrame(this);
|
||||||
|
// m_actionFrameLyt = new QVBoxLayout(m_actionFrame);
|
||||||
|
// m_actionFrame->setLayout(m_actionFrameLyt);
|
||||||
|
// m_actionFrameLyt->setContentsMargins(DETAIL_FRAME_MARGINS);
|
||||||
|
|
||||||
|
// m_mainLyt->addWidget(m_iconLabel);
|
||||||
|
// m_mainLyt->addWidget(m_previewFrame, 0, Qt::AlignHCenter);
|
||||||
|
// m_mainLyt->addWidget(m_nameFrame);
|
||||||
|
// m_mainLyt->addWidget(m_line_1);
|
||||||
|
// m_mainLyt->addWidget(m_descFrame);
|
||||||
|
// m_mainLyt->addWidget(m_line_2);
|
||||||
|
// m_mainLyt->addWidget(m_actionFrame);
|
||||||
|
// m_mainLyt->addStretch();
|
||||||
|
}
|
||||||
|
|
||||||
void DetailWidget::paintEvent(QPaintEvent *event)
|
void DetailWidget::paintEvent(QPaintEvent *event)
|
||||||
{
|
{
|
||||||
QStyleOption opt;
|
QStyleOption opt;
|
||||||
|
@ -683,6 +791,53 @@ void DetailWidget::clearLayout(QLayout *layout)
|
||||||
child = NULL;
|
child = NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
//ActionLabel::ActionLabel(const QString &action, const QString &key, const int &ActionKey, const QString &pluginId, const int type, QWidget *parent) : QLabel(parent)
|
||||||
|
//{
|
||||||
|
// m_action = action;
|
||||||
|
// m_key = key;
|
||||||
|
// m_actionKey = ActionKey;
|
||||||
|
// m_type = type;
|
||||||
|
// m_pluginId = pluginId;
|
||||||
|
// this->initUi();
|
||||||
|
// this->installEventFilter(this);
|
||||||
|
//}
|
||||||
|
|
||||||
|
//void ActionLabel::initUi()
|
||||||
|
//{
|
||||||
|
// this->setText(m_action);
|
||||||
|
// QPalette pal = palette();
|
||||||
|
// pal.setColor(QPalette::WindowText, ACTION_NORMAL_COLOR);
|
||||||
|
// pal.setColor(QPalette::Light, ACTION_HOVER_COLOR);
|
||||||
|
// pal.setColor(QPalette::Dark, ACTION_PRESS_COLOR);
|
||||||
|
// this->setPalette(pal);
|
||||||
|
// this->setForegroundRole(QPalette::WindowText);
|
||||||
|
// this->setCursor(QCursor(Qt::PointingHandCursor));
|
||||||
|
//}
|
||||||
|
|
||||||
|
//bool ActionLabel::eventFilter(QObject *watched, QEvent *event)
|
||||||
|
//{
|
||||||
|
// if (watched == this) {
|
||||||
|
// if(event->type() == QEvent::MouseButtonPress) {
|
||||||
|
// this->setForegroundRole(QPalette::Dark);
|
||||||
|
// return true;
|
||||||
|
// } else if(event->type() == QEvent::MouseButtonRelease) {
|
||||||
|
// SearchPluginIface *plugin = SearchPluginManager::getInstance()->getPlugin(m_pluginId);
|
||||||
|
// if (plugin)
|
||||||
|
// plugin->openAction(m_actionKey, m_key, m_type);
|
||||||
|
// else
|
||||||
|
// qWarning()<<"Get plugin failed!";
|
||||||
|
// this->setForegroundRole(QPalette::Light);
|
||||||
|
// return true;
|
||||||
|
// } else if(event->type() == QEvent::Enter) {
|
||||||
|
// this->setForegroundRole(QPalette::Light);
|
||||||
|
// return true;
|
||||||
|
// } else if(event->type() == QEvent::Leave) {
|
||||||
|
// this->setForegroundRole(QPalette::WindowText);
|
||||||
|
// return true;
|
||||||
|
// }
|
||||||
|
// }
|
||||||
|
//}
|
||||||
|
|
||||||
ResultScrollBar::ResultScrollBar(QWidget *parent) : QScrollBar(parent)
|
ResultScrollBar::ResultScrollBar(QWidget *parent) : QScrollBar(parent)
|
||||||
{
|
{
|
||||||
|
|
||||||
|
|
|
@ -28,6 +28,7 @@
|
||||||
#include "result-view.h"
|
#include "result-view.h"
|
||||||
#include "search-plugin-iface.h"
|
#include "search-plugin-iface.h"
|
||||||
#include "best-list-view.h"
|
#include "best-list-view.h"
|
||||||
|
#include "web-search-view.h"
|
||||||
|
|
||||||
namespace UkuiSearch {
|
namespace UkuiSearch {
|
||||||
class ResultScrollBar : public QScrollBar
|
class ResultScrollBar : public QScrollBar
|
||||||
|
@ -114,16 +115,32 @@ class DetailWidget : public QWidget
|
||||||
public:
|
public:
|
||||||
DetailWidget(QWidget *parent = nullptr);
|
DetailWidget(QWidget *parent = nullptr);
|
||||||
~DetailWidget() = default;
|
~DetailWidget() = default;
|
||||||
|
void clear();
|
||||||
|
|
||||||
public Q_SLOTS:
|
public Q_SLOTS:
|
||||||
|
void setWidgetInfo(const QString &plugin_name, const SearchPluginIface::ResultInfo &info);
|
||||||
void updateDetailPage(const QString &plugin_name, const SearchPluginIface::ResultInfo &info);
|
void updateDetailPage(const QString &plugin_name, const SearchPluginIface::ResultInfo &info);
|
||||||
protected:
|
protected:
|
||||||
void paintEvent(QPaintEvent *event);
|
void paintEvent(QPaintEvent *event);
|
||||||
private:
|
private:
|
||||||
|
void initUi();
|
||||||
void clearLayout(QLayout *);
|
void clearLayout(QLayout *);
|
||||||
QVBoxLayout * m_mainLyt = nullptr;
|
QVBoxLayout * m_mainLyt = nullptr;
|
||||||
QString m_currentPluginId;
|
QString m_currentPluginId;
|
||||||
QWidget *m_detailPage = nullptr;
|
QWidget *m_detailPage = nullptr;
|
||||||
|
// QLabel * m_iconLabel = nullptr;
|
||||||
|
// QFrame *m_previewFrame = nullptr;
|
||||||
|
// QHBoxLayout *m_previewFrameLyt = nullptr;
|
||||||
|
// QFrame * m_nameFrame = nullptr;
|
||||||
|
// QHBoxLayout * m_nameFrameLyt = nullptr;
|
||||||
|
// QLabel * m_nameLabel = nullptr;
|
||||||
|
// QLabel * m_pluginLabel = nullptr;
|
||||||
|
// QFrame * m_line_1 = nullptr;
|
||||||
|
// QFrame * m_descFrame = nullptr;
|
||||||
|
// QVBoxLayout * m_descFrameLyt = nullptr;
|
||||||
|
// QFrame * m_line_2 = nullptr;
|
||||||
|
// QFrame * m_actionFrame = nullptr;
|
||||||
|
// QVBoxLayout * m_actionFrameLyt = nullptr;
|
||||||
};
|
};
|
||||||
|
|
||||||
class DetailArea : public QScrollArea
|
class DetailArea : public QScrollArea
|
||||||
|
@ -139,6 +156,24 @@ private:
|
||||||
Q_SIGNALS:
|
Q_SIGNALS:
|
||||||
void setWidgetInfo(const QString&, const SearchPluginIface::ResultInfo&);
|
void setWidgetInfo(const QString&, const SearchPluginIface::ResultInfo&);
|
||||||
};
|
};
|
||||||
|
|
||||||
|
//class ActionLabel : public QLabel
|
||||||
|
//{
|
||||||
|
// Q_OBJECT
|
||||||
|
//public:
|
||||||
|
// ActionLabel(const QString &action, const QString &key, const int &ActionKey, const QString &pluginId, const int type = 0, QWidget *parent = nullptr);
|
||||||
|
// ~ActionLabel() = default;
|
||||||
|
//private:
|
||||||
|
// void initUi();
|
||||||
|
// QString m_action;
|
||||||
|
// QString m_key;
|
||||||
|
// int m_actionKey;
|
||||||
|
// int m_type = 0;
|
||||||
|
// QString m_pluginId;
|
||||||
|
|
||||||
|
//protected:
|
||||||
|
// bool eventFilter(QObject *, QEvent *);
|
||||||
|
//};
|
||||||
}
|
}
|
||||||
|
|
||||||
#endif // SEARCHPAGESECTION_H
|
#endif // SEARCHPAGESECTION_H
|
||||||
|
|
|
@ -19,7 +19,6 @@
|
||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
#include "search-result-page.h"
|
#include "search-result-page.h"
|
||||||
#include "global-settings.h"
|
|
||||||
#include <QPainterPath>
|
#include <QPainterPath>
|
||||||
QT_BEGIN_NAMESPACE
|
QT_BEGIN_NAMESPACE
|
||||||
extern void qt_blurImage(QImage &blurImage, qreal radius, bool quality, int transposed);
|
extern void qt_blurImage(QImage &blurImage, qreal radius, bool quality, int transposed);
|
||||||
|
@ -36,6 +35,11 @@ SearchResultPage::SearchResultPage(QWidget *parent) : QWidget(parent)
|
||||||
setInternalPlugins();
|
setInternalPlugins();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
void SearchResultPage::setSize(const int&width, const int&height)
|
||||||
|
{
|
||||||
|
// m_splitter->setFixedSize(width, height);
|
||||||
|
}
|
||||||
|
|
||||||
void SearchResultPage::setInternalPlugins()
|
void SearchResultPage::setInternalPlugins()
|
||||||
{
|
{
|
||||||
QList<PluginInfo> infoList = SearchPluginManager::getInstance()->getPluginIds();
|
QList<PluginInfo> infoList = SearchPluginManager::getInstance()->getPluginIds();
|
||||||
|
@ -125,11 +129,10 @@ void SearchResultPage::setWidth(int width)
|
||||||
|
|
||||||
void SearchResultPage::paintEvent(QPaintEvent *event)
|
void SearchResultPage::paintEvent(QPaintEvent *event)
|
||||||
{
|
{
|
||||||
Q_UNUSED(event)
|
|
||||||
QPainter p(this);
|
QPainter p(this);
|
||||||
p.setRenderHint(QPainter::Antialiasing);
|
p.setRenderHint(QPainter::Antialiasing);
|
||||||
p.setBrush(palette().base());
|
p.setBrush(palette().base());
|
||||||
p.setOpacity(GlobalSettings::getInstance().getValue(TRANSPARENCY_KEY).toDouble());
|
p.setOpacity(GlobalSettings::getInstance()->getValue(TRANSPARENCY_KEY).toDouble());
|
||||||
p.setPen(Qt::NoPen);
|
p.setPen(Qt::NoPen);
|
||||||
p.drawRoundedRect(this->rect().adjusted(10,10,-10,-10), 12, 12);
|
p.drawRoundedRect(this->rect().adjusted(10,10,-10,-10), 12, 12);
|
||||||
|
|
||||||
|
|
|
@ -31,6 +31,7 @@ class SearchResultPage : public QWidget
|
||||||
public:
|
public:
|
||||||
explicit SearchResultPage(QWidget *parent = nullptr);
|
explicit SearchResultPage(QWidget *parent = nullptr);
|
||||||
~SearchResultPage() = default;
|
~SearchResultPage() = default;
|
||||||
|
void setSize(const int&, const int&);
|
||||||
void setInternalPlugins();
|
void setInternalPlugins();
|
||||||
void appendPlugin(const QString &plugin_id);
|
void appendPlugin(const QString &plugin_id);
|
||||||
void movePlugin(const QString &plugin_id, int index);
|
void movePlugin(const QString &plugin_id, int index);
|
||||||
|
|
|
@ -10,7 +10,6 @@ TEMPLATE = app
|
||||||
PKGCONFIG += gio-2.0 glib-2.0 gio-unix-2.0 kysdk-waylandhelper
|
PKGCONFIG += gio-2.0 glib-2.0 gio-unix-2.0 kysdk-waylandhelper
|
||||||
CONFIG += c++11 link_pkgconfig no_keywords lrelease
|
CONFIG += c++11 link_pkgconfig no_keywords lrelease
|
||||||
LIBS += -lxapian -lgsettings-qt -lquazip5 -lX11
|
LIBS += -lxapian -lgsettings-qt -lquazip5 -lX11
|
||||||
LIBS += -lukui-appwidget-manager -lukui-appwidget-provider
|
|
||||||
#LIBS += -lukui-log4qt
|
#LIBS += -lukui-log4qt
|
||||||
# The following define makes your compiler emit warnings if you use
|
# The following define makes your compiler emit warnings if you use
|
||||||
# any Qt feature that has been marked deprecated (the exact warnings
|
# any Qt feature that has been marked deprecated (the exact warnings
|
||||||
|
@ -28,7 +27,6 @@ include(model/model.pri)
|
||||||
include(xatom/xatom.pri)
|
include(xatom/xatom.pri)
|
||||||
include(../3rd-parties/qtsingleapplication/qtsingleapplication.pri)
|
include(../3rd-parties/qtsingleapplication/qtsingleapplication.pri)
|
||||||
include(view/view.pri)
|
include(view/view.pri)
|
||||||
include(search-app-widget-plugin/search-app-widget-plugin.pri)
|
|
||||||
|
|
||||||
|
|
||||||
SOURCES += \
|
SOURCES += \
|
||||||
|
@ -56,44 +54,20 @@ data.files += ../data/ukui-search.desktop
|
||||||
INSTALLS += data data-menu
|
INSTALLS += data data-menu
|
||||||
|
|
||||||
RESOURCES += \
|
RESOURCES += \
|
||||||
resource.qrc \
|
resource.qrc
|
||||||
search-app-widget-plugin/provider/src.qrc
|
|
||||||
|
|
||||||
TRANSLATIONS += \
|
TRANSLATIONS += \
|
||||||
../translations/ukui-search/zh_CN.ts \
|
../translations/ukui-search/zh_CN.ts \
|
||||||
../translations/ukui-search/tr.ts \
|
../translations/ukui-search/tr.ts \
|
||||||
../translations/ukui-search/bo_CN.ts \
|
../translations/ukui-search/bo_CN.ts
|
||||||
../translations/ukui-search/appwidget/search_zh_CN.ts \
|
|
||||||
../translations/ukui-search/appwidget/search_bo_CN.ts
|
|
||||||
|
|
||||||
qm_files.path = /usr/share/ukui-search/translations/
|
qm_files.path = /usr/share/ukui-search/translations/
|
||||||
qm_files.files = $$OUT_PWD/.qm/zh_CN.qm \
|
qm_files.files = $$OUT_PWD/.qm/*.qm
|
||||||
$$OUT_PWD/.qm/bo_CN.qm \
|
|
||||||
$$OUT_PWD/.qm/tr.qm \
|
|
||||||
|
|
||||||
schemes.path = /usr/share/glib-2.0/schemas/
|
schemes.path = /usr/share/glib-2.0/schemas/
|
||||||
schemes.files += ../data/org.ukui.log4qt.ukui-search.gschema.xml
|
schemes.files += ../data/org.ukui.log4qt.ukui-search.gschema.xml
|
||||||
|
|
||||||
appwidget_qm_files.files = $$OUT_PWD/.qm/search_bo_CN.qm \
|
INSTALLS += qm_files schemes
|
||||||
$$OUT_PWD/.qm/search_zh_CN.qm
|
|
||||||
appwidget_qm_files.path = /usr/share/appwidget/translations/
|
|
||||||
|
|
||||||
qml.files += search-app-widget-plugin/provider/data/search.qml
|
|
||||||
qml.path = /usr/share/appwidget/qml/
|
|
||||||
|
|
||||||
appwidgetconf.files += search-app-widget-plugin/provider/data/search.conf
|
|
||||||
appwidgetconf.path = /usr/share/appwidget/config/
|
|
||||||
|
|
||||||
service.files += search-app-widget-plugin/provider/org.ukui.appwidget.provider.search.service
|
|
||||||
service.path += /usr/share/dbus-1/services/
|
|
||||||
|
|
||||||
preview.files += search-app-widget-plugin/provider/data/search.png
|
|
||||||
preview.path = /usr/share/appwidget/search/
|
|
||||||
|
|
||||||
svg.files += search-app-widget-plugin/provider/data/ukui-search.svg
|
|
||||||
svg.path = /usr/share/appwidget/search/
|
|
||||||
|
|
||||||
INSTALLS += qm_files schemes qml appwidget_qm_files appwidgetconf service preview svg
|
|
||||||
|
|
||||||
LIBS += -L$$OUT_PWD/../libchinese-segmentation -lchinese-segmentation \
|
LIBS += -L$$OUT_PWD/../libchinese-segmentation -lchinese-segmentation \
|
||||||
-L$$OUT_PWD/../libsearch -lukui-search
|
-L$$OUT_PWD/../libsearch -lukui-search
|
||||||
|
|
|
@ -21,17 +21,76 @@
|
||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#include <unistd.h>
|
|
||||||
|
#include <QDesktopWidget>
|
||||||
|
#include <QFile>
|
||||||
|
#include <QDir>
|
||||||
#include <syslog.h>
|
#include <syslog.h>
|
||||||
#include <KWindowSystem>
|
#if (QT_VERSION >= QT_VERSION_CHECK(5, 12, 0))
|
||||||
|
#include <ukui-log4qt.h>
|
||||||
|
#endif
|
||||||
|
#include <QObject>
|
||||||
|
#include <QApplication>
|
||||||
|
#include <QX11Info>
|
||||||
#include "ukui-search-gui.h"
|
#include "ukui-search-gui.h"
|
||||||
#include "log-utils.h"
|
|
||||||
|
|
||||||
using namespace UkuiSearch;
|
using namespace UkuiSearch;
|
||||||
|
|
||||||
|
void messageOutput(QtMsgType type, const QMessageLogContext &context, const QString &msg)
|
||||||
|
{
|
||||||
|
QByteArray localMsg = msg.toLocal8Bit();
|
||||||
|
QByteArray currentTime = QTime::currentTime().toString().toLocal8Bit();
|
||||||
|
|
||||||
|
bool showDebug = true;
|
||||||
|
// QString logFilePath = QStandardPaths::writableLocation(QStandardPaths::TempLocation) + "/ukui-search.log";
|
||||||
|
// QString logFilePath = QStandardPaths::writableLocation(QStandardPaths::HomeLocation) + "/.config/org.ukui/ukui-search/ukui-search.log";
|
||||||
|
QString logFilePath = QStandardPaths::writableLocation(QStandardPaths::HomeLocation) + "/.config/org.ukui/ukui-search.log";
|
||||||
|
if (!QFile::exists(logFilePath)) {
|
||||||
|
showDebug = false;
|
||||||
|
}
|
||||||
|
FILE *log_file = nullptr;
|
||||||
|
|
||||||
|
if (showDebug) {
|
||||||
|
log_file = fopen(logFilePath.toLocal8Bit().constData(), "a+");
|
||||||
|
}
|
||||||
|
|
||||||
|
const char *file = context.file ? context.file : "";
|
||||||
|
const char *function = context.function ? context.function : "";
|
||||||
|
switch (type) {
|
||||||
|
case QtDebugMsg:
|
||||||
|
if (!log_file) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
fprintf(log_file, "Debug: %s: %s (%s:%u, %s)\n", currentTime.constData(), localMsg.constData(), file, context.line, function);
|
||||||
|
break;
|
||||||
|
case QtInfoMsg:
|
||||||
|
fprintf(log_file? log_file: stdout, "Info: %s: %s (%s:%u, %s)\n", currentTime.constData(), localMsg.constData(), file, context.line, function);
|
||||||
|
break;
|
||||||
|
case QtWarningMsg:
|
||||||
|
fprintf(log_file? log_file: stderr, "Warning: %s: %s (%s:%u, %s)\n", currentTime.constData(), localMsg.constData(), file, context.line, function);
|
||||||
|
break;
|
||||||
|
case QtCriticalMsg:
|
||||||
|
fprintf(log_file? log_file: stderr, "Critical: %s: %s (%s:%u, %s)\n", currentTime.constData(), localMsg.constData(), file, context.line, function);
|
||||||
|
break;
|
||||||
|
case QtFatalMsg:
|
||||||
|
fprintf(log_file? log_file: stderr, "Fatal: %s: %s (%s:%u, %s)\n", currentTime.constData(), localMsg.constData(), file, context.line, function);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (log_file)
|
||||||
|
fclose(log_file);
|
||||||
|
}
|
||||||
|
|
||||||
int main(int argc, char *argv[]) {
|
int main(int argc, char *argv[]) {
|
||||||
|
//v101日志模块
|
||||||
|
//#if (QT_VERSION >= QT_VERSION_CHECK(5, 12, 0))
|
||||||
|
// //Init log module
|
||||||
|
// initUkuiLog4qt("ukui-search");
|
||||||
|
//#endif
|
||||||
|
|
||||||
// Determine whether the home directory has been created, and if not, keep waiting.
|
// Determine whether the home directory has been created, and if not, keep waiting.
|
||||||
char *p_home = NULL;
|
char *p_home = NULL;
|
||||||
|
|
||||||
unsigned int i = 0;
|
unsigned int i = 0;
|
||||||
while(p_home == NULL) {
|
while(p_home == NULL) {
|
||||||
::sleep(1);
|
::sleep(1);
|
||||||
|
@ -52,8 +111,14 @@ int main(int argc, char *argv[]) {
|
||||||
}
|
}
|
||||||
|
|
||||||
// Output log to file
|
// Output log to file
|
||||||
LogUtils::initLogFile("ukui-search");
|
qInstallMessageHandler(messageOutput);
|
||||||
qInstallMessageHandler(LogUtils::messageOutput);
|
//若使用v101日志模块,可以解放如下判断条件
|
||||||
|
//#if (QT_VERSION < QT_VERSION_CHECK(5, 12, 0))
|
||||||
|
// // Output log to file
|
||||||
|
// qInstallMessageHandler(messageOutput);
|
||||||
|
//#endif
|
||||||
|
|
||||||
|
// Register meta type
|
||||||
qDebug() << "ukui-search main start";
|
qDebug() << "ukui-search main start";
|
||||||
// If qt version bigger than 5.12, enable high dpi scaling and use high dpi pixmaps?
|
// If qt version bigger than 5.12, enable high dpi scaling and use high dpi pixmaps?
|
||||||
#if (QT_VERSION >= QT_VERSION_CHECK(5, 12, 0))
|
#if (QT_VERSION >= QT_VERSION_CHECK(5, 12, 0))
|
||||||
|
@ -63,13 +128,7 @@ int main(int argc, char *argv[]) {
|
||||||
#if (QT_VERSION >= QT_VERSION_CHECK(5, 14, 0))
|
#if (QT_VERSION >= QT_VERSION_CHECK(5, 14, 0))
|
||||||
QApplication::setHighDpiScaleFactorRoundingPolicy(Qt::HighDpiScaleFactorRoundingPolicy::PassThrough);
|
QApplication::setHighDpiScaleFactorRoundingPolicy(Qt::HighDpiScaleFactorRoundingPolicy::PassThrough);
|
||||||
#endif
|
#endif
|
||||||
QString display;
|
UkuiSearchGui app(argc, argv, QString("ukui-search-gui-%1").arg(QX11Info::appScreen()));
|
||||||
if(KWindowSystem::isPlatformWayland()) {
|
|
||||||
display = getenv("WAYLAND_DISPLAY");
|
|
||||||
} else if (KWindowSystem::isPlatformX11()) {
|
|
||||||
display = getenv("DISPLAY");
|
|
||||||
}
|
|
||||||
UkuiSearchGui app(argc, argv, QString("ukui-search-gui-%1").arg(display));
|
|
||||||
if (app.isRunning())
|
if (app.isRunning())
|
||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
|
|
|
@ -50,7 +50,7 @@
|
||||||
|
|
||||||
#define MAIN_SETTINGS QDir::homePath() + "/.config/org.ukui/ukui-search/ukui-search.conf"
|
#define MAIN_SETTINGS QDir::homePath() + "/.config/org.ukui/ukui-search/ukui-search.conf"
|
||||||
#define ENABLE_CREATE_INDEX_ASK_DIALOG "enable_create_index_ask_dialog"
|
#define ENABLE_CREATE_INDEX_ASK_DIALOG "enable_create_index_ask_dialog"
|
||||||
const static QString FILE_INDEX_ENABLE_KEY = "fileIndexEnable";
|
|
||||||
|
|
||||||
using namespace UkuiSearch;
|
using namespace UkuiSearch;
|
||||||
extern void qt_blurImage(QImage &blurImage, qreal radius, bool quality, int transposed);
|
extern void qt_blurImage(QImage &blurImage, qreal radius, bool quality, int transposed);
|
||||||
|
@ -81,6 +81,7 @@ MainWindow::MainWindow(QWidget *parent) :
|
||||||
installEventFilter(this);
|
installEventFilter(this);
|
||||||
initConnections();
|
initConnections();
|
||||||
|
|
||||||
|
|
||||||
// connect(KWindowSystem::self(), &KWindowSystem::activeWindowChanged, this,[&](WId activeWindowId){
|
// connect(KWindowSystem::self(), &KWindowSystem::activeWindowChanged, this,[&](WId activeWindowId){
|
||||||
// qDebug() << "activeWindowChanged!!!" << activeWindowId;
|
// qDebug() << "activeWindowChanged!!!" << activeWindowId;
|
||||||
// if (activeWindowId != this->winId()) {
|
// if (activeWindowId != this->winId()) {
|
||||||
|
@ -90,12 +91,9 @@ MainWindow::MainWindow(QWidget *parent) :
|
||||||
|
|
||||||
m_appWidgetPlugin = new AppWidgetPlugin;
|
m_appWidgetPlugin = new AppWidgetPlugin;
|
||||||
|
|
||||||
// connect(m_appWidgetPlugin, &AppWidgetPlugin::startSearch, this, [ & ] (QString keyword){
|
connect(m_appWidgetPlugin, &AppWidgetPlugin::startSearch, this, [ & ] (QString keyword){
|
||||||
// this->bootOptionsFilter("-s");
|
|
||||||
// this->setText(keyword);
|
|
||||||
// });
|
|
||||||
connect(m_appWidgetPlugin, &AppWidgetPlugin::start, this, [&] {
|
|
||||||
this->bootOptionsFilter("-s");
|
this->bootOptionsFilter("-s");
|
||||||
|
this->setText(keyword);
|
||||||
});
|
});
|
||||||
connect(ActionTransmiter::getInstance(), &ActionTransmiter::hideUIAction, this, &MainWindow::tryHideMainwindow);
|
connect(ActionTransmiter::getInstance(), &ActionTransmiter::hideUIAction, this, &MainWindow::tryHideMainwindow);
|
||||||
}
|
}
|
||||||
|
@ -105,10 +103,10 @@ MainWindow::~MainWindow() {
|
||||||
delete m_askDialog;
|
delete m_askDialog;
|
||||||
m_askDialog = NULL;
|
m_askDialog = NULL;
|
||||||
}
|
}
|
||||||
// if(m_askTimer) {
|
if(m_askTimer) {
|
||||||
// delete m_askTimer;
|
delete m_askTimer;
|
||||||
// m_askTimer = NULL;
|
m_askTimer = NULL;
|
||||||
// }
|
}
|
||||||
if(m_searchGsettings) {
|
if(m_searchGsettings) {
|
||||||
delete m_searchGsettings;
|
delete m_searchGsettings;
|
||||||
m_searchGsettings = NULL;
|
m_searchGsettings = NULL;
|
||||||
|
@ -136,7 +134,8 @@ void MainWindow::initUi() {
|
||||||
void MainWindow::initConnections()
|
void MainWindow::initConnections()
|
||||||
{
|
{
|
||||||
connect(m_sys_tray_icon, &QSystemTrayIcon::activated, this, &MainWindow::trayIconActivatedSlot);
|
connect(m_sys_tray_icon, &QSystemTrayIcon::activated, this, &MainWindow::trayIconActivatedSlot);
|
||||||
connect(QApplication::primaryScreen(), &QScreen::geometryChanged, this, &MainWindow::ScreenGeometryChanged);
|
connect(QApplication::primaryScreen(), &QScreen::geometryChanged, this, &MainWindow::monitorResolutionChange);
|
||||||
|
connect(qApp, &QApplication::primaryScreenChanged, this, &MainWindow::primaryScreenChangedSlot);
|
||||||
connect(m_askDialog, &CreateIndexAskDialog::closed, this, [ = ]() {
|
connect(m_askDialog, &CreateIndexAskDialog::closed, this, [ = ]() {
|
||||||
m_isAskDialogVisible = false;
|
m_isAskDialogVisible = false;
|
||||||
});
|
});
|
||||||
|
@ -164,8 +163,8 @@ void MainWindow::bootOptionsFilter(QString opt) {
|
||||||
clearSearchResult();
|
clearSearchResult();
|
||||||
centerToScreen(this);
|
centerToScreen(this);
|
||||||
this->m_searchBarWidget->setFocus();
|
this->m_searchBarWidget->setFocus();
|
||||||
|
this->activateWindow();
|
||||||
}
|
}
|
||||||
this->activateWindow();
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -236,7 +235,7 @@ void MainWindow::searchKeywordSlot(const QString &keyword)
|
||||||
//允许弹窗且当前次搜索(为关闭主界面,算一次搜索过程)未询问且当前为暴力搜索
|
//允许弹窗且当前次搜索(为关闭主界面,算一次搜索过程)未询问且当前为暴力搜索
|
||||||
if(m_settings->value(ENABLE_CREATE_INDEX_ASK_DIALOG).toBool()
|
if(m_settings->value(ENABLE_CREATE_INDEX_ASK_DIALOG).toBool()
|
||||||
&& !m_currentSearchAsked
|
&& !m_currentSearchAsked
|
||||||
&& !m_isIndexSearch) {
|
&& GlobalSettings::getInstance()->getValue(FILE_INDEX_ENABLE_KEY).toBool() == false) {
|
||||||
m_askTimer->start();
|
m_askTimer->start();
|
||||||
}
|
}
|
||||||
Q_EMIT m_searchResultPage->startSearch(keyword);
|
Q_EMIT m_searchResultPage->startSearch(keyword);
|
||||||
|
@ -259,10 +258,73 @@ void MainWindow::tryHide()
|
||||||
this->tryHideMainwindow();
|
this->tryHideMainwindow();
|
||||||
}
|
}
|
||||||
|
|
||||||
void MainWindow::ScreenGeometryChanged(QRect rect) {
|
/**
|
||||||
|
* @brief monitorResolutionChange 监听屏幕改变
|
||||||
|
* @param rect
|
||||||
|
*/
|
||||||
|
void MainWindow::monitorResolutionChange(QRect rect) {
|
||||||
Q_UNUSED(rect);
|
Q_UNUSED(rect);
|
||||||
if(this->isVisible()) {
|
}
|
||||||
centerToScreen(this);
|
|
||||||
|
/**
|
||||||
|
* @brief primaryScreenChangedSlot 监听分辨率改变
|
||||||
|
* @param screen
|
||||||
|
*/
|
||||||
|
void MainWindow::primaryScreenChangedSlot(QScreen *screen) {
|
||||||
|
Q_UNUSED(screen);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief MainWindow::moveToPanel 将主界面移动到任务栏旁边(跟随任务栏位置)
|
||||||
|
*/
|
||||||
|
void MainWindow::moveToPanel() {
|
||||||
|
QRect availableGeometry = qApp->primaryScreen()->availableGeometry();
|
||||||
|
QRect screenGeometry = qApp->primaryScreen()->geometry();
|
||||||
|
|
||||||
|
QDBusInterface primaryScreenInterface("org.ukui.SettingsDaemon",
|
||||||
|
"/org/ukui/SettingsDaemon/wayland",
|
||||||
|
"org.ukui.SettingsDaemon.wayland",
|
||||||
|
QDBusConnection::sessionBus());
|
||||||
|
if(QDBusReply<int>(primaryScreenInterface.call("x")).isValid()) {
|
||||||
|
QDBusReply<int> x = primaryScreenInterface.call("x");
|
||||||
|
QDBusReply<int> y = primaryScreenInterface.call("y");
|
||||||
|
QDBusReply<int> width = primaryScreenInterface.call("width");
|
||||||
|
QDBusReply<int> height = primaryScreenInterface.call("height");
|
||||||
|
screenGeometry.setX(x);
|
||||||
|
screenGeometry.setY(y);
|
||||||
|
screenGeometry.setWidth(width);
|
||||||
|
screenGeometry.setHeight(height);
|
||||||
|
availableGeometry.setX(x);
|
||||||
|
availableGeometry.setY(y);
|
||||||
|
availableGeometry.setWidth(width);
|
||||||
|
availableGeometry.setHeight(height);
|
||||||
|
}
|
||||||
|
|
||||||
|
QDesktopWidget * desktopWidget = QApplication::desktop();
|
||||||
|
QRect screenMainRect = desktopWidget->screenGeometry(0);//获取设备屏幕大小
|
||||||
|
|
||||||
|
QDBusInterface interface("com.ukui.panel.desktop",
|
||||||
|
"/",
|
||||||
|
"com.ukui.panel.desktop",
|
||||||
|
QDBusConnection::sessionBus());
|
||||||
|
|
||||||
|
int position = QDBusReply<int>(interface.call("GetPanelPosition", "position"));
|
||||||
|
int height = QDBusReply<int>(interface.call("GetPanelSize", "height"));
|
||||||
|
int d = 8; //窗口边沿到任务栏距离
|
||||||
|
|
||||||
|
if(position == 0) {
|
||||||
|
//任务栏在下侧
|
||||||
|
this->move(availableGeometry.x() + availableGeometry.width() - this->width() - d, screenGeometry.y() + screenGeometry.height() - this->height() - height - d);
|
||||||
|
} else if(position == 1) {
|
||||||
|
//任务栏在上侧
|
||||||
|
this->move(availableGeometry.x() + availableGeometry.width() - this->width() - d, screenGeometry.y() + height + d);
|
||||||
|
} else if(position == 2) {
|
||||||
|
//任务栏在左侧
|
||||||
|
this->move(screenGeometry.x() + height + d, screenGeometry.y() + screenGeometry.height() - this->height() - d);
|
||||||
|
} else if(position == 3) {
|
||||||
|
//任务栏在右侧
|
||||||
|
this->move(screenGeometry.x() + screenGeometry.width() - this->width() - height - d, screenGeometry.y() + screenGeometry.height() - this->height() - d);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -274,10 +336,12 @@ void MainWindow::centerToScreen(QWidget* widget) {
|
||||||
if(!widget)
|
if(!widget)
|
||||||
return;
|
return;
|
||||||
KWindowSystem::setState(this->winId(),NET::SkipTaskbar | NET::SkipPager);
|
KWindowSystem::setState(this->winId(),NET::SkipTaskbar | NET::SkipPager);
|
||||||
QRect desk_rect = qApp->screenAt(QCursor::pos())->geometry();
|
QDesktopWidget* m = QApplication::desktop();
|
||||||
|
QRect desk_rect = m->screenGeometry(m->screenNumber(QCursor::pos()));
|
||||||
int desk_x = desk_rect.width();
|
int desk_x = desk_rect.width();
|
||||||
int desk_y = desk_rect.height();
|
int desk_y = desk_rect.height();
|
||||||
int x = widget->width();
|
int x = widget->width();
|
||||||
|
int y = widget->height();
|
||||||
widget->show();
|
widget->show();
|
||||||
kdk::WindowManager::setGeometry(this->windowHandle(),QRect(desk_x / 2 - x / 2 + desk_rect.left(),
|
kdk::WindowManager::setGeometry(this->windowHandle(),QRect(desk_x / 2 - x / 2 + desk_rect.left(),
|
||||||
desk_y / 3 + desk_rect.top(),
|
desk_y / 3 + desk_rect.top(),
|
||||||
|
@ -294,13 +358,10 @@ void MainWindow::initSettings() {
|
||||||
const QByteArray id(UKUI_SEARCH_SCHEMAS);
|
const QByteArray id(UKUI_SEARCH_SCHEMAS);
|
||||||
if(QGSettings::isSchemaInstalled(id)) {
|
if(QGSettings::isSchemaInstalled(id)) {
|
||||||
m_searchGsettings = new QGSettings(id);
|
m_searchGsettings = new QGSettings(id);
|
||||||
if (m_searchGsettings->keys().contains(FILE_INDEX_ENABLE_KEY)) {
|
|
||||||
m_isIndexSearch = m_searchGsettings->get(FILE_INDEX_ENABLE_KEY).toBool();
|
|
||||||
}
|
|
||||||
connect(m_searchGsettings, &QGSettings::changed, this, [ = ](const QString & key) {
|
connect(m_searchGsettings, &QGSettings::changed, this, [ = ](const QString & key) {
|
||||||
if(key == FILE_INDEX_ENABLE_KEY) {
|
if(key == FILE_INDEX_ENABLE_KEY) {
|
||||||
m_isIndexSearch = m_searchGsettings->get(FILE_INDEX_ENABLE_KEY).toBool();
|
bool isIndexSearch = m_searchGsettings->get(FILE_INDEX_ENABLE_KEY).toBool();
|
||||||
if(m_researchTimer->isActive() && !m_isIndexSearch) {
|
if(m_researchTimer->isActive() && !isIndexSearch) {
|
||||||
m_researchTimer->stop();
|
m_researchTimer->stop();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -311,24 +372,14 @@ void MainWindow::initSettings() {
|
||||||
|
|
||||||
//使用GSetting获取当前窗口应该使用的透明度
|
//使用GSetting获取当前窗口应该使用的透明度
|
||||||
double MainWindow::getTransparentData() {
|
double MainWindow::getTransparentData() {
|
||||||
return GlobalSettings::getInstance().getValue(TRANSPARENCY_KEY).toDouble();
|
return GlobalSettings::getInstance()->getValue(TRANSPARENCY_KEY).toDouble();
|
||||||
}
|
}
|
||||||
|
|
||||||
void MainWindow::initTimer() {
|
void MainWindow::initTimer() {
|
||||||
m_askTimer = new QTimer(this);
|
m_askTimer = new QTimer;
|
||||||
m_askTimer->setInterval(ASK_INDEX_TIME);
|
m_askTimer->setInterval(ASK_INDEX_TIME);
|
||||||
connect(m_askTimer, &QTimer::timeout, this, [ = ]() {
|
connect(m_askTimer, &QTimer::timeout, this, [ = ]() {
|
||||||
QWindow *modal = QGuiApplication::modalWindow();
|
if(this->isVisible()) {
|
||||||
if(modal) {
|
|
||||||
m_askTimer->stop();
|
|
||||||
connect(modal, &QWindow::visibleChanged, this, [ & ](bool visible){
|
|
||||||
if(!visible) {
|
|
||||||
m_askTimer->start();
|
|
||||||
}
|
|
||||||
}, Qt::UniqueConnection);
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
if(this->isVisible() && !m_isIndexSearch) {
|
|
||||||
m_isAskDialogVisible = true;
|
m_isAskDialogVisible = true;
|
||||||
kdk::UkuiStyleHelper::self()->removeHeader(m_askDialog);
|
kdk::UkuiStyleHelper::self()->removeHeader(m_askDialog);
|
||||||
m_askDialog->show();
|
m_askDialog->show();
|
||||||
|
@ -353,9 +404,8 @@ void MainWindow::initTimer() {
|
||||||
m_askTimer->stop();
|
m_askTimer->stop();
|
||||||
} else {
|
} else {
|
||||||
//允许弹窗且当前次搜索(为关闭主界面,算一次搜索过程)未询问且当前为暴力搜索
|
//允许弹窗且当前次搜索(为关闭主界面,算一次搜索过程)未询问且当前为暴力搜索
|
||||||
if(m_settings->value(ENABLE_CREATE_INDEX_ASK_DIALOG, true).toBool() && !m_currentSearchAsked && !m_isIndexSearch) {
|
if(m_settings->value(ENABLE_CREATE_INDEX_ASK_DIALOG, true).toBool() && !m_currentSearchAsked && GlobalSettings::getInstance()->getValue(FILE_INDEX_ENABLE_KEY).toBool() == false)
|
||||||
m_askTimer->start();
|
m_askTimer->start();
|
||||||
}
|
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
@ -418,9 +468,8 @@ void MainWindow::keyPressEvent(QKeyEvent *event)
|
||||||
return QWidget::keyPressEvent(event);
|
return QWidget::keyPressEvent(event);
|
||||||
}
|
}
|
||||||
|
|
||||||
void MainWindow::paintEvent(QPaintEvent *event)
|
void MainWindow::paintEvent(QPaintEvent *event) {
|
||||||
{
|
|
||||||
Q_UNUSED(event)
|
|
||||||
QPainterPath path;
|
QPainterPath path;
|
||||||
|
|
||||||
path.addRoundedRect(m_searchBarWidget->x()+10, m_searchBarWidget->y()+10, m_searchBarWidget->width()-20, m_searchBarWidget->height()-20, 12, 12);
|
path.addRoundedRect(m_searchBarWidget->x()+10, m_searchBarWidget->y()+10, m_searchBarWidget->width()-20, m_searchBarWidget->height()-20, 12, 12);
|
||||||
|
|
|
@ -67,6 +67,11 @@ class MainWindow : public QMainWindow {
|
||||||
public:
|
public:
|
||||||
explicit MainWindow(QWidget *parent = nullptr);
|
explicit MainWindow(QWidget *parent = nullptr);
|
||||||
~MainWindow();
|
~MainWindow();
|
||||||
|
/**
|
||||||
|
* @brief Load the main window
|
||||||
|
* The position which mainwindow shows follow the ukui-panel.
|
||||||
|
*/
|
||||||
|
void moveToPanel();
|
||||||
|
|
||||||
// The position which mainwindow shows in the center of screen where the cursor in.
|
// The position which mainwindow shows in the center of screen where the cursor in.
|
||||||
void centerToScreen(QWidget* widget);
|
void centerToScreen(QWidget* widget);
|
||||||
|
@ -82,8 +87,16 @@ public:
|
||||||
bool eventFilter(QObject *watched, QEvent *event) override;
|
bool eventFilter(QObject *watched, QEvent *event) override;
|
||||||
|
|
||||||
public Q_SLOTS:
|
public Q_SLOTS:
|
||||||
|
/**
|
||||||
void ScreenGeometryChanged(QRect rect);
|
* @brief Monitor screen resolution
|
||||||
|
* @param rect: Screen resolution
|
||||||
|
*/
|
||||||
|
void monitorResolutionChange(QRect rect);
|
||||||
|
/**
|
||||||
|
* @brief Monitor primary screen changes
|
||||||
|
* @param screen: Primary screen
|
||||||
|
*/
|
||||||
|
void primaryScreenChangedSlot(QScreen *screen);
|
||||||
void bootOptionsFilter(QString opt); // 过滤终端命令
|
void bootOptionsFilter(QString opt); // 过滤终端命令
|
||||||
void clearSearchResult(); //清空搜索结果
|
void clearSearchResult(); //清空搜索结果
|
||||||
void trayIconActivatedSlot(QSystemTrayIcon::ActivationReason reason);
|
void trayIconActivatedSlot(QSystemTrayIcon::ActivationReason reason);
|
||||||
|
@ -114,7 +127,7 @@ private:
|
||||||
QGSettings *m_searchGsettings = nullptr;
|
QGSettings *m_searchGsettings = nullptr;
|
||||||
QSettings *m_settings = nullptr;
|
QSettings *m_settings = nullptr;
|
||||||
AppWidgetPlugin *m_appWidgetPlugin = nullptr;
|
AppWidgetPlugin *m_appWidgetPlugin = nullptr;
|
||||||
bool m_isIndexSearch = false;
|
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -31,7 +31,6 @@ BestListModel::BestListModel(QObject *parent)
|
||||||
|
|
||||||
QModelIndex BestListModel::index(int row, int column, const QModelIndex &parent) const
|
QModelIndex BestListModel::index(int row, int column, const QModelIndex &parent) const
|
||||||
{
|
{
|
||||||
Q_UNUSED(parent)
|
|
||||||
if(row < 0 || row > m_item->m_result_info_list.length() - 1)
|
if(row < 0 || row > m_item->m_result_info_list.length() - 1)
|
||||||
return QModelIndex();
|
return QModelIndex();
|
||||||
return createIndex(row, column, m_item);
|
return createIndex(row, column, m_item);
|
||||||
|
@ -39,7 +38,6 @@ QModelIndex BestListModel::index(int row, int column, const QModelIndex &parent)
|
||||||
|
|
||||||
QModelIndex BestListModel::parent(const QModelIndex &index) const
|
QModelIndex BestListModel::parent(const QModelIndex &index) const
|
||||||
{
|
{
|
||||||
Q_UNUSED(index)
|
|
||||||
return QModelIndex();
|
return QModelIndex();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -98,7 +96,6 @@ QStringList BestListModel::getActions(const QModelIndex &index)
|
||||||
{
|
{
|
||||||
// if (m_item->m_result_info_list.length() > index.row() && index.row() >= 0)
|
// if (m_item->m_result_info_list.length() > index.row() && index.row() >= 0)
|
||||||
// return m_item->m_result_info_list.at(index.row()).actionList;
|
// return m_item->m_result_info_list.at(index.row()).actionList;
|
||||||
Q_UNUSED(index)
|
|
||||||
return QStringList();
|
return QStringList();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -203,7 +200,6 @@ void BestListModel::moveInfo(const QString &pluginName, const int pos)
|
||||||
|
|
||||||
void BestListModel::startSearch(const QString &keyword)
|
void BestListModel::startSearch(const QString &keyword)
|
||||||
{
|
{
|
||||||
Q_UNUSED(keyword)
|
|
||||||
if (!m_item->m_result_info_list.isEmpty()) {
|
if (!m_item->m_result_info_list.isEmpty()) {
|
||||||
this->beginResetModel();
|
this->beginResetModel();
|
||||||
m_plugin_id_list.clear();
|
m_plugin_id_list.clear();
|
||||||
|
|
|
@ -1,22 +1,3 @@
|
||||||
/*
|
|
||||||
*
|
|
||||||
* Copyright (C) 2023, KylinSoft Co., Ltd.
|
|
||||||
*
|
|
||||||
* This program is free software: you can redistribute it and/or modify
|
|
||||||
* it under the terms of the GNU General Public License as published by
|
|
||||||
* the Free Software Foundation, either version 3 of the License, or
|
|
||||||
* (at your option) any later version.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
||||||
* GNU General Public License for more details.
|
|
||||||
*
|
|
||||||
* You should have received a copy of the GNU General Public License
|
|
||||||
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
||||||
*
|
|
||||||
*
|
|
||||||
*/
|
|
||||||
#ifndef BESTLISTMODEL_H
|
#ifndef BESTLISTMODEL_H
|
||||||
#define BESTLISTMODEL_H
|
#define BESTLISTMODEL_H
|
||||||
|
|
||||||
|
|
|
@ -19,7 +19,6 @@
|
||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
#include "search-result-manager.h"
|
#include "search-result-manager.h"
|
||||||
#include <QDeadlineTimer>
|
|
||||||
|
|
||||||
using namespace UkuiSearch;
|
using namespace UkuiSearch;
|
||||||
SearchResultManager::SearchResultManager(const QString& plugin_id, QObject *parent) : QObject(parent)
|
SearchResultManager::SearchResultManager(const QString& plugin_id, QObject *parent) : QObject(parent)
|
||||||
|
@ -48,10 +47,10 @@ void SearchResultManager::stopSearch()
|
||||||
{
|
{
|
||||||
if(m_getResultThread->isRunning()) {
|
if(m_getResultThread->isRunning()) {
|
||||||
m_getResultThread->stop();
|
m_getResultThread->stop();
|
||||||
|
SearchPluginIface *plugin = SearchPluginManager::getInstance()->getPlugin(m_pluginId);
|
||||||
|
plugin->stopSearch();
|
||||||
|
qDebug() << m_pluginId << "stopped";
|
||||||
}
|
}
|
||||||
SearchPluginIface *plugin = SearchPluginManager::getInstance()->getPlugin(m_pluginId);
|
|
||||||
plugin->stopSearch();
|
|
||||||
qDebug() << m_pluginId << "stopped";
|
|
||||||
}
|
}
|
||||||
|
|
||||||
void SearchResultManager::initConnections()
|
void SearchResultManager::initConnections()
|
||||||
|
@ -72,18 +71,24 @@ void ReceiveResultThread::stop()
|
||||||
|
|
||||||
void ReceiveResultThread::run()
|
void ReceiveResultThread::run()
|
||||||
{
|
{
|
||||||
QDeadlineTimer deadline(3000);
|
QTimer *timer = new QTimer;
|
||||||
|
timer->setInterval(3000);
|
||||||
|
|
||||||
while(!isInterruptionRequested()) {
|
while(!isInterruptionRequested()) {
|
||||||
SearchPluginIface::ResultInfo oneResult = m_resultQueue->tryDequeue();
|
SearchPluginIface::ResultInfo oneResult = m_resultQueue->tryDequeue();
|
||||||
if(oneResult.name.isEmpty()) {
|
if(oneResult.name.isEmpty()) {
|
||||||
if(deadline.remainingTime()) {
|
if(!timer->isActive()) {
|
||||||
msleep(100);
|
timer->start();
|
||||||
} else {
|
|
||||||
this->requestInterruption();
|
|
||||||
}
|
}
|
||||||
|
msleep(100);
|
||||||
} else {
|
} else {
|
||||||
deadline.setRemainingTime(3000);
|
timer->stop();
|
||||||
Q_EMIT gotResultInfo(oneResult);
|
Q_EMIT gotResultInfo(oneResult);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if(timer->isActive() && timer->remainingTime() < 0.01 && m_resultQueue->isEmpty()) {
|
||||||
|
this->requestInterruption();
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
delete m_timer;
|
||||||
}
|
}
|
||||||
|
|
|
@ -33,7 +33,6 @@ SearchResultModel::SearchResultModel(const QString &plugin_id)
|
||||||
|
|
||||||
QModelIndex SearchResultModel::index(int row, int column, const QModelIndex &parent) const
|
QModelIndex SearchResultModel::index(int row, int column, const QModelIndex &parent) const
|
||||||
{
|
{
|
||||||
Q_UNUSED(parent)
|
|
||||||
if(row < 0 || row > m_item->m_result_info_list.length() - 1)
|
if(row < 0 || row > m_item->m_result_info_list.length() - 1)
|
||||||
return QModelIndex();
|
return QModelIndex();
|
||||||
// QVector<SearchPluginIface::ResultInfo> * m_info = &m_result_info_list;
|
// QVector<SearchPluginIface::ResultInfo> * m_info = &m_result_info_list;
|
||||||
|
@ -42,7 +41,6 @@ QModelIndex SearchResultModel::index(int row, int column, const QModelIndex &par
|
||||||
|
|
||||||
QModelIndex SearchResultModel::parent(const QModelIndex &child) const
|
QModelIndex SearchResultModel::parent(const QModelIndex &child) const
|
||||||
{
|
{
|
||||||
Q_UNUSED(child)
|
|
||||||
return QModelIndex();
|
return QModelIndex();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -137,6 +135,25 @@ const bool &SearchResultModel::isExpanded()
|
||||||
return m_isExpanded;
|
return m_isExpanded;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief SearchResultModel::getActions 获取操作列表
|
||||||
|
* @param index
|
||||||
|
* @return
|
||||||
|
*/
|
||||||
|
QStringList SearchResultModel::getActions(const QModelIndex &index)
|
||||||
|
{
|
||||||
|
if (m_item->m_result_info_list.length() > index.row() && index.row() >= 0)
|
||||||
|
// return m_item->m_result_info_list.at(index.row()).actionList;
|
||||||
|
return QStringList();
|
||||||
|
}
|
||||||
|
|
||||||
|
QString SearchResultModel::getKey(const QModelIndex &index)
|
||||||
|
{
|
||||||
|
if (m_item->m_result_info_list.length() > index.row() && index.row() >= 0)
|
||||||
|
// return m_item->m_result_info_list.at(index.row()).key;
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
|
||||||
void SearchResultModel::refresh()
|
void SearchResultModel::refresh()
|
||||||
{
|
{
|
||||||
this->beginResetModel();
|
this->beginResetModel();
|
||||||
|
|
|
@ -54,6 +54,8 @@ public:
|
||||||
const SearchPluginIface::ResultInfo & getInfo(const QModelIndex&);
|
const SearchPluginIface::ResultInfo & getInfo(const QModelIndex&);
|
||||||
void setExpanded(const bool&);
|
void setExpanded(const bool&);
|
||||||
const bool &isExpanded();
|
const bool &isExpanded();
|
||||||
|
QStringList getActions(const QModelIndex &);
|
||||||
|
QString getKey(const QModelIndex &);
|
||||||
void refresh();
|
void refresh();
|
||||||
|
|
||||||
public Q_SLOTS:
|
public Q_SLOTS:
|
||||||
|
|
|
@ -30,7 +30,6 @@ WebSearchModel::WebSearchModel(QObject *parent)
|
||||||
|
|
||||||
QModelIndex WebSearchModel::index(int row, int column, const QModelIndex &parent) const
|
QModelIndex WebSearchModel::index(int row, int column, const QModelIndex &parent) const
|
||||||
{
|
{
|
||||||
Q_UNUSED(parent)
|
|
||||||
if(row < 0 || row > m_item->m_result_info_list.length() - 1)
|
if(row < 0 || row > m_item->m_result_info_list.length() - 1)
|
||||||
return QModelIndex();
|
return QModelIndex();
|
||||||
return createIndex(row, column, m_item);
|
return createIndex(row, column, m_item);
|
||||||
|
@ -38,7 +37,6 @@ QModelIndex WebSearchModel::index(int row, int column, const QModelIndex &parent
|
||||||
|
|
||||||
QModelIndex WebSearchModel::parent(const QModelIndex &index) const
|
QModelIndex WebSearchModel::parent(const QModelIndex &index) const
|
||||||
{
|
{
|
||||||
Q_UNUSED(index)
|
|
||||||
return QModelIndex();
|
return QModelIndex();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -1,22 +1,3 @@
|
||||||
/*
|
|
||||||
*
|
|
||||||
* Copyright (C) 2023, KylinSoft Co., Ltd.
|
|
||||||
*
|
|
||||||
* This program is free software: you can redistribute it and/or modify
|
|
||||||
* it under the terms of the GNU General Public License as published by
|
|
||||||
* the Free Software Foundation, either version 3 of the License, or
|
|
||||||
* (at your option) any later version.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
||||||
* GNU General Public License for more details.
|
|
||||||
*
|
|
||||||
* You should have received a copy of the GNU General Public License
|
|
||||||
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
||||||
*
|
|
||||||
*
|
|
||||||
*/
|
|
||||||
#ifndef WEBSEARCHMODEL_H
|
#ifndef WEBSEARCHMODEL_H
|
||||||
#define WEBSEARCHMODEL_H
|
#define WEBSEARCHMODEL_H
|
||||||
|
|
||||||
|
|
|
@ -1,22 +1,3 @@
|
||||||
/*
|
|
||||||
*
|
|
||||||
* Copyright (C) 2023, KylinSoft Co., Ltd.
|
|
||||||
*
|
|
||||||
* This program is free software: you can redistribute it and/or modify
|
|
||||||
* it under the terms of the GNU General Public License as published by
|
|
||||||
* the Free Software Foundation, either version 3 of the License, or
|
|
||||||
* (at your option) any later version.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
||||||
* GNU General Public License for more details.
|
|
||||||
*
|
|
||||||
* You should have received a copy of the GNU General Public License
|
|
||||||
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
||||||
*
|
|
||||||
*
|
|
||||||
*/
|
|
||||||
#include "ukui-search-dbus-service.h"
|
#include "ukui-search-dbus-service.h"
|
||||||
|
|
||||||
using namespace UkuiSearch;
|
using namespace UkuiSearch;
|
||||||
|
|
|
@ -1,22 +1,3 @@
|
||||||
/*
|
|
||||||
*
|
|
||||||
* Copyright (C) 2023, KylinSoft Co., Ltd.
|
|
||||||
*
|
|
||||||
* This program is free software: you can redistribute it and/or modify
|
|
||||||
* it under the terms of the GNU General Public License as published by
|
|
||||||
* the Free Software Foundation, either version 3 of the License, or
|
|
||||||
* (at your option) any later version.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
||||||
* GNU General Public License for more details.
|
|
||||||
*
|
|
||||||
* You should have received a copy of the GNU General Public License
|
|
||||||
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
||||||
*
|
|
||||||
*
|
|
||||||
*/
|
|
||||||
#ifndef UKUISEARCHDBUSSERVICE_H
|
#ifndef UKUISEARCHDBUSSERVICE_H
|
||||||
#define UKUISEARCHDBUSSERVICE_H
|
#define UKUISEARCHDBUSSERVICE_H
|
||||||
|
|
||||||
|
|
|
@ -1,22 +1,3 @@
|
||||||
/*
|
|
||||||
*
|
|
||||||
* Copyright (C) 2023, KylinSoft Co., Ltd.
|
|
||||||
*
|
|
||||||
* This program is free software: you can redistribute it and/or modify
|
|
||||||
* it under the terms of the GNU General Public License as published by
|
|
||||||
* the Free Software Foundation, either version 3 of the License, or
|
|
||||||
* (at your option) any later version.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
||||||
* GNU General Public License for more details.
|
|
||||||
*
|
|
||||||
* You should have received a copy of the GNU General Public License
|
|
||||||
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
||||||
*
|
|
||||||
* Authors: iaom <zhangpengfei@kylinos.cn>
|
|
||||||
*/
|
|
||||||
#include "ukui-search-gui.h"
|
#include "ukui-search-gui.h"
|
||||||
#include <QScreen>
|
#include <QScreen>
|
||||||
#include <QTranslator>
|
#include <QTranslator>
|
||||||
|
|
|
@ -1,22 +1,3 @@
|
||||||
/*
|
|
||||||
*
|
|
||||||
* Copyright (C) 2023, KylinSoft Co., Ltd.
|
|
||||||
*
|
|
||||||
* This program is free software: you can redistribute it and/or modify
|
|
||||||
* it under the terms of the GNU General Public License as published by
|
|
||||||
* the Free Software Foundation, either version 3 of the License, or
|
|
||||||
* (at your option) any later version.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
||||||
* GNU General Public License for more details.
|
|
||||||
*
|
|
||||||
* You should have received a copy of the GNU General Public License
|
|
||||||
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
||||||
*
|
|
||||||
* Authors: iaom <zhangpengfei@kylinos.cn>
|
|
||||||
*/
|
|
||||||
#ifndef UKUISEARCHGUI_H
|
#ifndef UKUISEARCHGUI_H
|
||||||
#define UKUISEARCHGUI_H
|
#define UKUISEARCHGUI_H
|
||||||
|
|
||||||
|
|
|
@ -1,22 +1,3 @@
|
||||||
/*
|
|
||||||
*
|
|
||||||
* Copyright (C) 2023, KylinSoft Co., Ltd.
|
|
||||||
*
|
|
||||||
* This program is free software: you can redistribute it and/or modify
|
|
||||||
* it under the terms of the GNU General Public License as published by
|
|
||||||
* the Free Software Foundation, either version 3 of the License, or
|
|
||||||
* (at your option) any later version.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
||||||
* GNU General Public License for more details.
|
|
||||||
*
|
|
||||||
* You should have received a copy of the GNU General Public License
|
|
||||||
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
||||||
*
|
|
||||||
*
|
|
||||||
*/
|
|
||||||
#include "best-list-view.h"
|
#include "best-list-view.h"
|
||||||
#define MAIN_MARGINS 0,0,0,0
|
#define MAIN_MARGINS 0,0,0,0
|
||||||
#define MAIN_SPACING 0
|
#define MAIN_SPACING 0
|
||||||
|
@ -167,6 +148,21 @@ const bool &BestListView::isExpanded()
|
||||||
return m_model->isExpanded();
|
return m_model->isExpanded();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief BestListView::onMenuTriggered 点击右键菜单的槽函数
|
||||||
|
* @param action
|
||||||
|
*/
|
||||||
|
void BestListView::onMenuTriggered(QAction *action)
|
||||||
|
{
|
||||||
|
//NEW_TODO 接口调整后需要修改
|
||||||
|
// SearchPluginIface *plugin = SearchPluginManager::getInstance()->getPlugin(m_plugin_id);
|
||||||
|
// if (plugin) {
|
||||||
|
//// plugin->openAction(action->text(), m_model->getKey(this->currentIndex()));
|
||||||
|
// } else {
|
||||||
|
// qWarning()<<"Get plugin failed!";
|
||||||
|
// }
|
||||||
|
}
|
||||||
|
|
||||||
void BestListView::mousePressEvent(QMouseEvent *event)
|
void BestListView::mousePressEvent(QMouseEvent *event)
|
||||||
{
|
{
|
||||||
m_tmpCurrentIndex = this->currentIndex();
|
m_tmpCurrentIndex = this->currentIndex();
|
||||||
|
|
|
@ -1,22 +1,3 @@
|
||||||
/*
|
|
||||||
*
|
|
||||||
* Copyright (C) 2023, KylinSoft Co., Ltd.
|
|
||||||
*
|
|
||||||
* This program is free software: you can redistribute it and/or modify
|
|
||||||
* it under the terms of the GNU General Public License as published by
|
|
||||||
* the Free Software Foundation, either version 3 of the License, or
|
|
||||||
* (at your option) any later version.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
||||||
* GNU General Public License for more details.
|
|
||||||
*
|
|
||||||
* You should have received a copy of the GNU General Public License
|
|
||||||
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
||||||
*
|
|
||||||
*
|
|
||||||
*/
|
|
||||||
#ifndef BESTLISTVIEW_H
|
#ifndef BESTLISTVIEW_H
|
||||||
#define BESTLISTVIEW_H
|
#define BESTLISTVIEW_H
|
||||||
#include <QTreeView>
|
#include <QTreeView>
|
||||||
|
@ -53,6 +34,7 @@ public Q_SLOTS:
|
||||||
void onItemListChanged(const int &);
|
void onItemListChanged(const int &);
|
||||||
void setExpanded(const bool &);
|
void setExpanded(const bool &);
|
||||||
const bool &isExpanded();
|
const bool &isExpanded();
|
||||||
|
void onMenuTriggered(QAction *);
|
||||||
|
|
||||||
protected:
|
protected:
|
||||||
void mousePressEvent(QMouseEvent *event);
|
void mousePressEvent(QMouseEvent *event);
|
||||||
|
|
|
@ -1,25 +1,5 @@
|
||||||
/*
|
|
||||||
*
|
|
||||||
* Copyright (C) 2023, KylinSoft Co., Ltd.
|
|
||||||
*
|
|
||||||
* This program is free software: you can redistribute it and/or modify
|
|
||||||
* it under the terms of the GNU General Public License as published by
|
|
||||||
* the Free Software Foundation, either version 3 of the License, or
|
|
||||||
* (at your option) any later version.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
||||||
* GNU General Public License for more details.
|
|
||||||
*
|
|
||||||
* You should have received a copy of the GNU General Public License
|
|
||||||
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
||||||
*
|
|
||||||
*
|
|
||||||
*/
|
|
||||||
#include "result-view-delegate.h"
|
#include "result-view-delegate.h"
|
||||||
#include <QPainterPath>
|
#include <QPainterPath>
|
||||||
#include <QApplication>
|
|
||||||
using namespace UkuiSearch;
|
using namespace UkuiSearch;
|
||||||
static ResultItemStyle *global_instance_of_item_style = nullptr;
|
static ResultItemStyle *global_instance_of_item_style = nullptr;
|
||||||
|
|
||||||
|
|
|
@ -30,6 +30,7 @@
|
||||||
#include <QSyntaxHighlighter>
|
#include <QSyntaxHighlighter>
|
||||||
#include <QTextCharFormat>
|
#include <QTextCharFormat>
|
||||||
#include <QRegExp>
|
#include <QRegExp>
|
||||||
|
#include "global-settings.h"
|
||||||
|
|
||||||
namespace UkuiSearch {
|
namespace UkuiSearch {
|
||||||
class HightLightEffectHelper : public QSyntaxHighlighter
|
class HightLightEffectHelper : public QSyntaxHighlighter
|
||||||
|
|
|
@ -1,22 +1,3 @@
|
||||||
/*
|
|
||||||
*
|
|
||||||
* Copyright (C) 2023, KylinSoft Co., Ltd.
|
|
||||||
*
|
|
||||||
* This program is free software: you can redistribute it and/or modify
|
|
||||||
* it under the terms of the GNU General Public License as published by
|
|
||||||
* the Free Software Foundation, either version 3 of the License, or
|
|
||||||
* (at your option) any later version.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
||||||
* GNU General Public License for more details.
|
|
||||||
*
|
|
||||||
* You should have received a copy of the GNU General Public License
|
|
||||||
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
||||||
*
|
|
||||||
*
|
|
||||||
*/
|
|
||||||
#include "result-view.h"
|
#include "result-view.h"
|
||||||
#define MAIN_MARGINS 0,0,0,0
|
#define MAIN_MARGINS 0,0,0,0
|
||||||
#define MAIN_SPACING 0
|
#define MAIN_SPACING 0
|
||||||
|
@ -325,7 +306,6 @@ const bool &ResultView::isExpanded()
|
||||||
*/
|
*/
|
||||||
void ResultView::onMenuTriggered(QAction *action)
|
void ResultView::onMenuTriggered(QAction *action)
|
||||||
{
|
{
|
||||||
Q_UNUSED(action)
|
|
||||||
//NEW_TODO 接口调整后需要修改
|
//NEW_TODO 接口调整后需要修改
|
||||||
SearchPluginIface *plugin = SearchPluginManager::getInstance()->getPlugin(m_plugin_id);
|
SearchPluginIface *plugin = SearchPluginManager::getInstance()->getPlugin(m_plugin_id);
|
||||||
if (plugin) {
|
if (plugin) {
|
||||||
|
|
|
@ -1,22 +1,3 @@
|
||||||
/*
|
|
||||||
*
|
|
||||||
* Copyright (C) 2023, KylinSoft Co., Ltd.
|
|
||||||
*
|
|
||||||
* This program is free software: you can redistribute it and/or modify
|
|
||||||
* it under the terms of the GNU General Public License as published by
|
|
||||||
* the Free Software Foundation, either version 3 of the License, or
|
|
||||||
* (at your option) any later version.
|
|
||||||
*
|
|
||||||
* This program is distributed in the hope that it will be useful,
|
|
||||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
||||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
||||||
* GNU General Public License for more details.
|
|
||||||
*
|
|
||||||
* You should have received a copy of the GNU General Public License
|
|
||||||
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
||||||
*
|
|
||||||
*
|
|
||||||
*/
|
|
||||||
#ifndef RESULTVIEW_H
|
#ifndef RESULTVIEW_H
|
||||||
#define RESULTVIEW_H
|
#define RESULTVIEW_H
|
||||||
#include <QTreeView>
|
#include <QTreeView>
|
||||||
|
|
|
@ -3,9 +3,11 @@ INCLUDEPATH += $$PWD
|
||||||
HEADERS += \
|
HEADERS += \
|
||||||
$$PWD/best-list-view.h \
|
$$PWD/best-list-view.h \
|
||||||
$$PWD/result-view-delegate.h \
|
$$PWD/result-view-delegate.h \
|
||||||
$$PWD/result-view.h
|
$$PWD/result-view.h \
|
||||||
|
$$PWD/web-search-view.h
|
||||||
|
|
||||||
SOURCES += \
|
SOURCES += \
|
||||||
$$PWD/best-list-view.cpp \
|
$$PWD/best-list-view.cpp \
|
||||||
$$PWD/result-view-delegate.cpp \
|
$$PWD/result-view-delegate.cpp \
|
||||||
$$PWD/result-view.cpp
|
$$PWD/result-view.cpp \
|
||||||
|
$$PWD/web-search-view.cpp
|
||||||
|
|
|
@ -0,0 +1,195 @@
|
||||||
|
/*
|
||||||
|
*
|
||||||
|
* Copyright (C) 2021, KylinSoft Co., Ltd.
|
||||||
|
*
|
||||||
|
* This program is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This program is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||||
|
*
|
||||||
|
* Authors: jixiaoxu <jixiaoxu@kylinos.cn>
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
#include <QDBusReply>
|
||||||
|
#include "web-search-view.h"
|
||||||
|
#define MAIN_MARGINS 0,0,0,0
|
||||||
|
#define MAIN_SPACING 0
|
||||||
|
#define TITLE_HEIGHT 30
|
||||||
|
#define VIEW_ICON_SIZE 24
|
||||||
|
|
||||||
|
using namespace UkuiSearch;
|
||||||
|
WebSearchView::WebSearchView(QWidget *parent) : QTreeView(parent)
|
||||||
|
{
|
||||||
|
setStyle(ResultItemStyle::getStyle());
|
||||||
|
this->setFrameShape(QFrame::NoFrame);
|
||||||
|
this->viewport()->setAutoFillBackground(false);
|
||||||
|
this->setRootIsDecorated(false);
|
||||||
|
this->setIconSize(QSize(VIEW_ICON_SIZE, VIEW_ICON_SIZE));
|
||||||
|
this->setVerticalScrollBarPolicy(Qt::ScrollBarAlwaysOff);
|
||||||
|
this->setSelectionBehavior(QAbstractItemView::SelectRows);
|
||||||
|
this->setSelectionMode(QAbstractItemView::SingleSelection);
|
||||||
|
this->setHeaderHidden(true);
|
||||||
|
m_model = new WebSearchModel(this);
|
||||||
|
this->setModel(m_model);
|
||||||
|
m_styleDelegate = new ResultViewDelegate(this);
|
||||||
|
this->setItemDelegate(m_styleDelegate);
|
||||||
|
}
|
||||||
|
|
||||||
|
bool WebSearchView::isSelected()
|
||||||
|
{
|
||||||
|
return m_is_selected;
|
||||||
|
}
|
||||||
|
|
||||||
|
int WebSearchView::showHeight()
|
||||||
|
{
|
||||||
|
return this->rowHeight(this->model()->index(0, 0, QModelIndex()));
|
||||||
|
}
|
||||||
|
|
||||||
|
QModelIndex WebSearchView::getModlIndex(int row, int column)
|
||||||
|
{
|
||||||
|
return this->m_model->index(row, column);
|
||||||
|
}
|
||||||
|
|
||||||
|
void WebSearchView::clearSelectedRow()
|
||||||
|
{
|
||||||
|
if (!m_is_selected) {
|
||||||
|
this->blockSignals(true);
|
||||||
|
//this->clearSelection();
|
||||||
|
this->setCurrentIndex(QModelIndex());
|
||||||
|
this->blockSignals(false);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void WebSearchView::startSearch(const QString & keyword)
|
||||||
|
{
|
||||||
|
this->m_styleDelegate->setSearchKeyword(keyword);
|
||||||
|
this->m_model->startSearch(keyword);
|
||||||
|
m_keyWord = keyword;
|
||||||
|
}
|
||||||
|
|
||||||
|
void WebSearchView::mouseReleaseEvent(QMouseEvent *event)
|
||||||
|
{
|
||||||
|
QModelIndex index = indexAt(event->pos());
|
||||||
|
if (!index.isValid()) {
|
||||||
|
this->clearSelection();
|
||||||
|
}
|
||||||
|
return QTreeView::mouseReleaseEvent(event);
|
||||||
|
}
|
||||||
|
|
||||||
|
void WebSearchView::LaunchBrowser()
|
||||||
|
{
|
||||||
|
QString address;
|
||||||
|
QString engine = GlobalSettings::getInstance()->getValue("web_engine").toString();
|
||||||
|
if(!engine.isEmpty()) {
|
||||||
|
if(engine == "360") {
|
||||||
|
address = "https://so.com/s?q=" + m_keyWord; //360
|
||||||
|
} else if(engine == "sougou") {
|
||||||
|
address = "https://www.sogou.com/web?query=" + m_keyWord; //搜狗
|
||||||
|
} else {
|
||||||
|
address = "http://baidu.com/s?word=" + m_keyWord; //百度
|
||||||
|
}
|
||||||
|
} else { //默认值
|
||||||
|
address = "http://baidu.com/s?word=" + m_keyWord ; //百度
|
||||||
|
}
|
||||||
|
bool res(false);
|
||||||
|
QDBusInterface * appLaunchInterface = new QDBusInterface("com.kylin.AppManager",
|
||||||
|
"/com/kylin/AppManager",
|
||||||
|
"com.kylin.AppManager",
|
||||||
|
QDBusConnection::sessionBus());
|
||||||
|
if(!appLaunchInterface->isValid()) {
|
||||||
|
qWarning() << qPrintable(QDBusConnection::sessionBus().lastError().message());
|
||||||
|
res = false;
|
||||||
|
} else {
|
||||||
|
appLaunchInterface->setTimeout(10000);
|
||||||
|
QDBusReply<bool> reply = appLaunchInterface->call("LaunchDefaultAppWithUrl", address);
|
||||||
|
if(reply.isValid()) {
|
||||||
|
res = reply;
|
||||||
|
} else {
|
||||||
|
qWarning() << "SoftWareCenter dbus called failed!";
|
||||||
|
res = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if(appLaunchInterface) {
|
||||||
|
delete appLaunchInterface;
|
||||||
|
}
|
||||||
|
appLaunchInterface = NULL;
|
||||||
|
if (res)
|
||||||
|
return;
|
||||||
|
QDesktopServices::openUrl(address);
|
||||||
|
}
|
||||||
|
|
||||||
|
void WebSearchView::initConnections()
|
||||||
|
{
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
WebSearchWidget::WebSearchWidget(QWidget *parent) : QWidget(parent)
|
||||||
|
{
|
||||||
|
this->initUi();
|
||||||
|
initConnections();
|
||||||
|
}
|
||||||
|
|
||||||
|
QString WebSearchWidget::getWidgetName()
|
||||||
|
{
|
||||||
|
return m_titleLabel->text();
|
||||||
|
}
|
||||||
|
|
||||||
|
void WebSearchWidget::setEnabled(const bool &enabled)
|
||||||
|
{
|
||||||
|
m_enabled = enabled;
|
||||||
|
}
|
||||||
|
|
||||||
|
void WebSearchWidget::clearResultSelection()
|
||||||
|
{
|
||||||
|
this->m_webSearchView->setCurrentIndex(QModelIndex());
|
||||||
|
}
|
||||||
|
|
||||||
|
QModelIndex WebSearchWidget::getModlIndex(int row, int column)
|
||||||
|
{
|
||||||
|
return this->m_webSearchView->getModlIndex(row, column);
|
||||||
|
}
|
||||||
|
|
||||||
|
void WebSearchWidget::setResultSelection(const QModelIndex &index)
|
||||||
|
{
|
||||||
|
this->m_webSearchView->setCurrentIndex(index);
|
||||||
|
}
|
||||||
|
|
||||||
|
void WebSearchWidget::LaunchBrowser()
|
||||||
|
{
|
||||||
|
this->m_webSearchView->LaunchBrowser();
|
||||||
|
}
|
||||||
|
|
||||||
|
void WebSearchWidget::initUi()
|
||||||
|
{
|
||||||
|
m_mainLyt = new QVBoxLayout(this);
|
||||||
|
this->setLayout(m_mainLyt);
|
||||||
|
m_mainLyt->setContentsMargins(MAIN_MARGINS);
|
||||||
|
m_mainLyt->setSpacing(MAIN_SPACING);
|
||||||
|
|
||||||
|
m_titleLabel = new TitleLabel(this);
|
||||||
|
m_titleLabel->setText(tr("Web Page"));
|
||||||
|
m_titleLabel->setFixedHeight(TITLE_HEIGHT);
|
||||||
|
|
||||||
|
m_webSearchView = new WebSearchView(this);
|
||||||
|
|
||||||
|
m_mainLyt->addWidget(m_titleLabel);
|
||||||
|
m_mainLyt->addWidget(m_webSearchView);
|
||||||
|
this->setFixedHeight(m_webSearchView->height() + TITLE_HEIGHT);
|
||||||
|
this->setFixedWidth(656);
|
||||||
|
}
|
||||||
|
|
||||||
|
void WebSearchWidget::initConnections()
|
||||||
|
{
|
||||||
|
connect(this, &WebSearchWidget::startSearch, m_webSearchView, &WebSearchView::startSearch);
|
||||||
|
connect(m_webSearchView, &WebSearchView::clicked, this, [=] () {
|
||||||
|
this->LaunchBrowser();
|
||||||
|
});
|
||||||
|
}
|
|
@ -0,0 +1,72 @@
|
||||||
|
#ifndef WEBSEARCHVIEW_H
|
||||||
|
#define WEBSEARCHVIEW_H
|
||||||
|
#include <QTreeView>
|
||||||
|
#include <QListView>
|
||||||
|
#include <QMouseEvent>
|
||||||
|
#include "web-search-model.h"
|
||||||
|
#include "result-view-delegate.h"
|
||||||
|
#include "title-label.h"
|
||||||
|
|
||||||
|
namespace UkuiSearch {
|
||||||
|
class WebSearchView : public QTreeView
|
||||||
|
{
|
||||||
|
Q_OBJECT
|
||||||
|
public:
|
||||||
|
WebSearchView(QWidget *parent = nullptr);
|
||||||
|
~WebSearchView() = default;
|
||||||
|
|
||||||
|
bool isSelected();
|
||||||
|
int showHeight();
|
||||||
|
QModelIndex getModlIndex(int row, int column);
|
||||||
|
void LaunchBrowser();
|
||||||
|
|
||||||
|
public Q_SLOTS:
|
||||||
|
void clearSelectedRow();
|
||||||
|
void startSearch(const QString &);
|
||||||
|
|
||||||
|
protected:
|
||||||
|
void mouseReleaseEvent(QMouseEvent *event);
|
||||||
|
|
||||||
|
private:
|
||||||
|
void initConnections();
|
||||||
|
|
||||||
|
WebSearchModel * m_model = nullptr;
|
||||||
|
bool m_is_selected = false;
|
||||||
|
ResultViewDelegate * m_styleDelegate = nullptr;
|
||||||
|
QString m_keyWord;
|
||||||
|
};
|
||||||
|
|
||||||
|
class WebSearchWidget : public QWidget
|
||||||
|
{
|
||||||
|
Q_OBJECT
|
||||||
|
public:
|
||||||
|
WebSearchWidget(QWidget *parent = nullptr);
|
||||||
|
~WebSearchWidget() = default;
|
||||||
|
|
||||||
|
QString getWidgetName();
|
||||||
|
void setEnabled(const bool&);
|
||||||
|
void clearResultSelection();
|
||||||
|
QModelIndex getModlIndex(int row, int column);
|
||||||
|
void setResultSelection(const QModelIndex &index);
|
||||||
|
void LaunchBrowser();
|
||||||
|
|
||||||
|
private:
|
||||||
|
void initUi();
|
||||||
|
void initConnections();
|
||||||
|
|
||||||
|
bool m_enabled = true;
|
||||||
|
QVBoxLayout * m_mainLyt = nullptr;
|
||||||
|
QHBoxLayout * m_resultLyt = nullptr;
|
||||||
|
TitleLabel * m_titleLabel = nullptr;
|
||||||
|
WebSearchView * m_webSearchView = nullptr;
|
||||||
|
QLabel * m_queryIcon = nullptr;
|
||||||
|
|
||||||
|
Q_SIGNALS:
|
||||||
|
void startSearch(const QString &);
|
||||||
|
void clearSelectedRow();
|
||||||
|
void rowClicked();
|
||||||
|
|
||||||
|
};
|
||||||
|
|
||||||
|
}
|
||||||
|
#endif // WEBSEARCHVIEW_H
|
|
@ -1 +0,0 @@
|
||||||
Subproject commit 4734827d7c31936f1485e4513316b05cb7c8714f
|
|
|
@ -0,0 +1,674 @@
|
||||||
|
GNU GENERAL PUBLIC LICENSE
|
||||||
|
Version 3, 29 June 2007
|
||||||
|
|
||||||
|
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
|
||||||
|
Everyone is permitted to copy and distribute verbatim copies
|
||||||
|
of this license document, but changing it is not allowed.
|
||||||
|
|
||||||
|
Preamble
|
||||||
|
|
||||||
|
The GNU General Public License is a free, copyleft license for
|
||||||
|
software and other kinds of works.
|
||||||
|
|
||||||
|
The licenses for most software and other practical works are designed
|
||||||
|
to take away your freedom to share and change the works. By contrast,
|
||||||
|
the GNU General Public License is intended to guarantee your freedom to
|
||||||
|
share and change all versions of a program--to make sure it remains free
|
||||||
|
software for all its users. We, the Free Software Foundation, use the
|
||||||
|
GNU General Public License for most of our software; it applies also to
|
||||||
|
any other work released this way by its authors. You can apply it to
|
||||||
|
your programs, too.
|
||||||
|
|
||||||
|
When we speak of free software, we are referring to freedom, not
|
||||||
|
price. Our General Public Licenses are designed to make sure that you
|
||||||
|
have the freedom to distribute copies of free software (and charge for
|
||||||
|
them if you wish), that you receive source code or can get it if you
|
||||||
|
want it, that you can change the software or use pieces of it in new
|
||||||
|
free programs, and that you know you can do these things.
|
||||||
|
|
||||||
|
To protect your rights, we need to prevent others from denying you
|
||||||
|
these rights or asking you to surrender the rights. Therefore, you have
|
||||||
|
certain responsibilities if you distribute copies of the software, or if
|
||||||
|
you modify it: responsibilities to respect the freedom of others.
|
||||||
|
|
||||||
|
For example, if you distribute copies of such a program, whether
|
||||||
|
gratis or for a fee, you must pass on to the recipients the same
|
||||||
|
freedoms that you received. You must make sure that they, too, receive
|
||||||
|
or can get the source code. And you must show them these terms so they
|
||||||
|
know their rights.
|
||||||
|
|
||||||
|
Developers that use the GNU GPL protect your rights with two steps:
|
||||||
|
(1) assert copyright on the software, and (2) offer you this License
|
||||||
|
giving you legal permission to copy, distribute and/or modify it.
|
||||||
|
|
||||||
|
For the developers' and authors' protection, the GPL clearly explains
|
||||||
|
that there is no warranty for this free software. For both users' and
|
||||||
|
authors' sake, the GPL requires that modified versions be marked as
|
||||||
|
changed, so that their problems will not be attributed erroneously to
|
||||||
|
authors of previous versions.
|
||||||
|
|
||||||
|
Some devices are designed to deny users access to install or run
|
||||||
|
modified versions of the software inside them, although the manufacturer
|
||||||
|
can do so. This is fundamentally incompatible with the aim of
|
||||||
|
protecting users' freedom to change the software. The systematic
|
||||||
|
pattern of such abuse occurs in the area of products for individuals to
|
||||||
|
use, which is precisely where it is most unacceptable. Therefore, we
|
||||||
|
have designed this version of the GPL to prohibit the practice for those
|
||||||
|
products. If such problems arise substantially in other domains, we
|
||||||
|
stand ready to extend this provision to those domains in future versions
|
||||||
|
of the GPL, as needed to protect the freedom of users.
|
||||||
|
|
||||||
|
Finally, every program is threatened constantly by software patents.
|
||||||
|
States should not allow patents to restrict development and use of
|
||||||
|
software on general-purpose computers, but in those that do, we wish to
|
||||||
|
avoid the special danger that patents applied to a free program could
|
||||||
|
make it effectively proprietary. To prevent this, the GPL assures that
|
||||||
|
patents cannot be used to render the program non-free.
|
||||||
|
|
||||||
|
The precise terms and conditions for copying, distribution and
|
||||||
|
modification follow.
|
||||||
|
|
||||||
|
TERMS AND CONDITIONS
|
||||||
|
|
||||||
|
0. Definitions.
|
||||||
|
|
||||||
|
"This License" refers to version 3 of the GNU General Public License.
|
||||||
|
|
||||||
|
"Copyright" also means copyright-like laws that apply to other kinds of
|
||||||
|
works, such as semiconductor masks.
|
||||||
|
|
||||||
|
"The Program" refers to any copyrightable work licensed under this
|
||||||
|
License. Each licensee is addressed as "you". "Licensees" and
|
||||||
|
"recipients" may be individuals or organizations.
|
||||||
|
|
||||||
|
To "modify" a work means to copy from or adapt all or part of the work
|
||||||
|
in a fashion requiring copyright permission, other than the making of an
|
||||||
|
exact copy. The resulting work is called a "modified version" of the
|
||||||
|
earlier work or a work "based on" the earlier work.
|
||||||
|
|
||||||
|
A "covered work" means either the unmodified Program or a work based
|
||||||
|
on the Program.
|
||||||
|
|
||||||
|
To "propagate" a work means to do anything with it that, without
|
||||||
|
permission, would make you directly or secondarily liable for
|
||||||
|
infringement under applicable copyright law, except executing it on a
|
||||||
|
computer or modifying a private copy. Propagation includes copying,
|
||||||
|
distribution (with or without modification), making available to the
|
||||||
|
public, and in some countries other activities as well.
|
||||||
|
|
||||||
|
To "convey" a work means any kind of propagation that enables other
|
||||||
|
parties to make or receive copies. Mere interaction with a user through
|
||||||
|
a computer network, with no transfer of a copy, is not conveying.
|
||||||
|
|
||||||
|
An interactive user interface displays "Appropriate Legal Notices"
|
||||||
|
to the extent that it includes a convenient and prominently visible
|
||||||
|
feature that (1) displays an appropriate copyright notice, and (2)
|
||||||
|
tells the user that there is no warranty for the work (except to the
|
||||||
|
extent that warranties are provided), that licensees may convey the
|
||||||
|
work under this License, and how to view a copy of this License. If
|
||||||
|
the interface presents a list of user commands or options, such as a
|
||||||
|
menu, a prominent item in the list meets this criterion.
|
||||||
|
|
||||||
|
1. Source Code.
|
||||||
|
|
||||||
|
The "source code" for a work means the preferred form of the work
|
||||||
|
for making modifications to it. "Object code" means any non-source
|
||||||
|
form of a work.
|
||||||
|
|
||||||
|
A "Standard Interface" means an interface that either is an official
|
||||||
|
standard defined by a recognized standards body, or, in the case of
|
||||||
|
interfaces specified for a particular programming language, one that
|
||||||
|
is widely used among developers working in that language.
|
||||||
|
|
||||||
|
The "System Libraries" of an executable work include anything, other
|
||||||
|
than the work as a whole, that (a) is included in the normal form of
|
||||||
|
packaging a Major Component, but which is not part of that Major
|
||||||
|
Component, and (b) serves only to enable use of the work with that
|
||||||
|
Major Component, or to implement a Standard Interface for which an
|
||||||
|
implementation is available to the public in source code form. A
|
||||||
|
"Major Component", in this context, means a major essential component
|
||||||
|
(kernel, window system, and so on) of the specific operating system
|
||||||
|
(if any) on which the executable work runs, or a compiler used to
|
||||||
|
produce the work, or an object code interpreter used to run it.
|
||||||
|
|
||||||
|
The "Corresponding Source" for a work in object code form means all
|
||||||
|
the source code needed to generate, install, and (for an executable
|
||||||
|
work) run the object code and to modify the work, including scripts to
|
||||||
|
control those activities. However, it does not include the work's
|
||||||
|
System Libraries, or general-purpose tools or generally available free
|
||||||
|
programs which are used unmodified in performing those activities but
|
||||||
|
which are not part of the work. For example, Corresponding Source
|
||||||
|
includes interface definition files associated with source files for
|
||||||
|
the work, and the source code for shared libraries and dynamically
|
||||||
|
linked subprograms that the work is specifically designed to require,
|
||||||
|
such as by intimate data communication or control flow between those
|
||||||
|
subprograms and other parts of the work.
|
||||||
|
|
||||||
|
The Corresponding Source need not include anything that users
|
||||||
|
can regenerate automatically from other parts of the Corresponding
|
||||||
|
Source.
|
||||||
|
|
||||||
|
The Corresponding Source for a work in source code form is that
|
||||||
|
same work.
|
||||||
|
|
||||||
|
2. Basic Permissions.
|
||||||
|
|
||||||
|
All rights granted under this License are granted for the term of
|
||||||
|
copyright on the Program, and are irrevocable provided the stated
|
||||||
|
conditions are met. This License explicitly affirms your unlimited
|
||||||
|
permission to run the unmodified Program. The output from running a
|
||||||
|
covered work is covered by this License only if the output, given its
|
||||||
|
content, constitutes a covered work. This License acknowledges your
|
||||||
|
rights of fair use or other equivalent, as provided by copyright law.
|
||||||
|
|
||||||
|
You may make, run and propagate covered works that you do not
|
||||||
|
convey, without conditions so long as your license otherwise remains
|
||||||
|
in force. You may convey covered works to others for the sole purpose
|
||||||
|
of having them make modifications exclusively for you, or provide you
|
||||||
|
with facilities for running those works, provided that you comply with
|
||||||
|
the terms of this License in conveying all material for which you do
|
||||||
|
not control copyright. Those thus making or running the covered works
|
||||||
|
for you must do so exclusively on your behalf, under your direction
|
||||||
|
and control, on terms that prohibit them from making any copies of
|
||||||
|
your copyrighted material outside their relationship with you.
|
||||||
|
|
||||||
|
Conveying under any other circumstances is permitted solely under
|
||||||
|
the conditions stated below. Sublicensing is not allowed; section 10
|
||||||
|
makes it unnecessary.
|
||||||
|
|
||||||
|
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
|
||||||
|
|
||||||
|
No covered work shall be deemed part of an effective technological
|
||||||
|
measure under any applicable law fulfilling obligations under article
|
||||||
|
11 of the WIPO copyright treaty adopted on 20 December 1996, or
|
||||||
|
similar laws prohibiting or restricting circumvention of such
|
||||||
|
measures.
|
||||||
|
|
||||||
|
When you convey a covered work, you waive any legal power to forbid
|
||||||
|
circumvention of technological measures to the extent such circumvention
|
||||||
|
is effected by exercising rights under this License with respect to
|
||||||
|
the covered work, and you disclaim any intention to limit operation or
|
||||||
|
modification of the work as a means of enforcing, against the work's
|
||||||
|
users, your or third parties' legal rights to forbid circumvention of
|
||||||
|
technological measures.
|
||||||
|
|
||||||
|
4. Conveying Verbatim Copies.
|
||||||
|
|
||||||
|
You may convey verbatim copies of the Program's source code as you
|
||||||
|
receive it, in any medium, provided that you conspicuously and
|
||||||
|
appropriately publish on each copy an appropriate copyright notice;
|
||||||
|
keep intact all notices stating that this License and any
|
||||||
|
non-permissive terms added in accord with section 7 apply to the code;
|
||||||
|
keep intact all notices of the absence of any warranty; and give all
|
||||||
|
recipients a copy of this License along with the Program.
|
||||||
|
|
||||||
|
You may charge any price or no price for each copy that you convey,
|
||||||
|
and you may offer support or warranty protection for a fee.
|
||||||
|
|
||||||
|
5. Conveying Modified Source Versions.
|
||||||
|
|
||||||
|
You may convey a work based on the Program, or the modifications to
|
||||||
|
produce it from the Program, in the form of source code under the
|
||||||
|
terms of section 4, provided that you also meet all of these conditions:
|
||||||
|
|
||||||
|
a) The work must carry prominent notices stating that you modified
|
||||||
|
it, and giving a relevant date.
|
||||||
|
|
||||||
|
b) The work must carry prominent notices stating that it is
|
||||||
|
released under this License and any conditions added under section
|
||||||
|
7. This requirement modifies the requirement in section 4 to
|
||||||
|
"keep intact all notices".
|
||||||
|
|
||||||
|
c) You must license the entire work, as a whole, under this
|
||||||
|
License to anyone who comes into possession of a copy. This
|
||||||
|
License will therefore apply, along with any applicable section 7
|
||||||
|
additional terms, to the whole of the work, and all its parts,
|
||||||
|
regardless of how they are packaged. This License gives no
|
||||||
|
permission to license the work in any other way, but it does not
|
||||||
|
invalidate such permission if you have separately received it.
|
||||||
|
|
||||||
|
d) If the work has interactive user interfaces, each must display
|
||||||
|
Appropriate Legal Notices; however, if the Program has interactive
|
||||||
|
interfaces that do not display Appropriate Legal Notices, your
|
||||||
|
work need not make them do so.
|
||||||
|
|
||||||
|
A compilation of a covered work with other separate and independent
|
||||||
|
works, which are not by their nature extensions of the covered work,
|
||||||
|
and which are not combined with it such as to form a larger program,
|
||||||
|
in or on a volume of a storage or distribution medium, is called an
|
||||||
|
"aggregate" if the compilation and its resulting copyright are not
|
||||||
|
used to limit the access or legal rights of the compilation's users
|
||||||
|
beyond what the individual works permit. Inclusion of a covered work
|
||||||
|
in an aggregate does not cause this License to apply to the other
|
||||||
|
parts of the aggregate.
|
||||||
|
|
||||||
|
6. Conveying Non-Source Forms.
|
||||||
|
|
||||||
|
You may convey a covered work in object code form under the terms
|
||||||
|
of sections 4 and 5, provided that you also convey the
|
||||||
|
machine-readable Corresponding Source under the terms of this License,
|
||||||
|
in one of these ways:
|
||||||
|
|
||||||
|
a) Convey the object code in, or embodied in, a physical product
|
||||||
|
(including a physical distribution medium), accompanied by the
|
||||||
|
Corresponding Source fixed on a durable physical medium
|
||||||
|
customarily used for software interchange.
|
||||||
|
|
||||||
|
b) Convey the object code in, or embodied in, a physical product
|
||||||
|
(including a physical distribution medium), accompanied by a
|
||||||
|
written offer, valid for at least three years and valid for as
|
||||||
|
long as you offer spare parts or customer support for that product
|
||||||
|
model, to give anyone who possesses the object code either (1) a
|
||||||
|
copy of the Corresponding Source for all the software in the
|
||||||
|
product that is covered by this License, on a durable physical
|
||||||
|
medium customarily used for software interchange, for a price no
|
||||||
|
more than your reasonable cost of physically performing this
|
||||||
|
conveying of source, or (2) access to copy the
|
||||||
|
Corresponding Source from a network server at no charge.
|
||||||
|
|
||||||
|
c) Convey individual copies of the object code with a copy of the
|
||||||
|
written offer to provide the Corresponding Source. This
|
||||||
|
alternative is allowed only occasionally and noncommercially, and
|
||||||
|
only if you received the object code with such an offer, in accord
|
||||||
|
with subsection 6b.
|
||||||
|
|
||||||
|
d) Convey the object code by offering access from a designated
|
||||||
|
place (gratis or for a charge), and offer equivalent access to the
|
||||||
|
Corresponding Source in the same way through the same place at no
|
||||||
|
further charge. You need not require recipients to copy the
|
||||||
|
Corresponding Source along with the object code. If the place to
|
||||||
|
copy the object code is a network server, the Corresponding Source
|
||||||
|
may be on a different server (operated by you or a third party)
|
||||||
|
that supports equivalent copying facilities, provided you maintain
|
||||||
|
clear directions next to the object code saying where to find the
|
||||||
|
Corresponding Source. Regardless of what server hosts the
|
||||||
|
Corresponding Source, you remain obligated to ensure that it is
|
||||||
|
available for as long as needed to satisfy these requirements.
|
||||||
|
|
||||||
|
e) Convey the object code using peer-to-peer transmission, provided
|
||||||
|
you inform other peers where the object code and Corresponding
|
||||||
|
Source of the work are being offered to the general public at no
|
||||||
|
charge under subsection 6d.
|
||||||
|
|
||||||
|
A separable portion of the object code, whose source code is excluded
|
||||||
|
from the Corresponding Source as a System Library, need not be
|
||||||
|
included in conveying the object code work.
|
||||||
|
|
||||||
|
A "User Product" is either (1) a "consumer product", which means any
|
||||||
|
tangible personal property which is normally used for personal, family,
|
||||||
|
or household purposes, or (2) anything designed or sold for incorporation
|
||||||
|
into a dwelling. In determining whether a product is a consumer product,
|
||||||
|
doubtful cases shall be resolved in favor of coverage. For a particular
|
||||||
|
product received by a particular user, "normally used" refers to a
|
||||||
|
typical or common use of that class of product, regardless of the status
|
||||||
|
of the particular user or of the way in which the particular user
|
||||||
|
actually uses, or expects or is expected to use, the product. A product
|
||||||
|
is a consumer product regardless of whether the product has substantial
|
||||||
|
commercial, industrial or non-consumer uses, unless such uses represent
|
||||||
|
the only significant mode of use of the product.
|
||||||
|
|
||||||
|
"Installation Information" for a User Product means any methods,
|
||||||
|
procedures, authorization keys, or other information required to install
|
||||||
|
and execute modified versions of a covered work in that User Product from
|
||||||
|
a modified version of its Corresponding Source. The information must
|
||||||
|
suffice to ensure that the continued functioning of the modified object
|
||||||
|
code is in no case prevented or interfered with solely because
|
||||||
|
modification has been made.
|
||||||
|
|
||||||
|
If you convey an object code work under this section in, or with, or
|
||||||
|
specifically for use in, a User Product, and the conveying occurs as
|
||||||
|
part of a transaction in which the right of possession and use of the
|
||||||
|
User Product is transferred to the recipient in perpetuity or for a
|
||||||
|
fixed term (regardless of how the transaction is characterized), the
|
||||||
|
Corresponding Source conveyed under this section must be accompanied
|
||||||
|
by the Installation Information. But this requirement does not apply
|
||||||
|
if neither you nor any third party retains the ability to install
|
||||||
|
modified object code on the User Product (for example, the work has
|
||||||
|
been installed in ROM).
|
||||||
|
|
||||||
|
The requirement to provide Installation Information does not include a
|
||||||
|
requirement to continue to provide support service, warranty, or updates
|
||||||
|
for a work that has been modified or installed by the recipient, or for
|
||||||
|
the User Product in which it has been modified or installed. Access to a
|
||||||
|
network may be denied when the modification itself materially and
|
||||||
|
adversely affects the operation of the network or violates the rules and
|
||||||
|
protocols for communication across the network.
|
||||||
|
|
||||||
|
Corresponding Source conveyed, and Installation Information provided,
|
||||||
|
in accord with this section must be in a format that is publicly
|
||||||
|
documented (and with an implementation available to the public in
|
||||||
|
source code form), and must require no special password or key for
|
||||||
|
unpacking, reading or copying.
|
||||||
|
|
||||||
|
7. Additional Terms.
|
||||||
|
|
||||||
|
"Additional permissions" are terms that supplement the terms of this
|
||||||
|
License by making exceptions from one or more of its conditions.
|
||||||
|
Additional permissions that are applicable to the entire Program shall
|
||||||
|
be treated as though they were included in this License, to the extent
|
||||||
|
that they are valid under applicable law. If additional permissions
|
||||||
|
apply only to part of the Program, that part may be used separately
|
||||||
|
under those permissions, but the entire Program remains governed by
|
||||||
|
this License without regard to the additional permissions.
|
||||||
|
|
||||||
|
When you convey a copy of a covered work, you may at your option
|
||||||
|
remove any additional permissions from that copy, or from any part of
|
||||||
|
it. (Additional permissions may be written to require their own
|
||||||
|
removal in certain cases when you modify the work.) You may place
|
||||||
|
additional permissions on material, added by you to a covered work,
|
||||||
|
for which you have or can give appropriate copyright permission.
|
||||||
|
|
||||||
|
Notwithstanding any other provision of this License, for material you
|
||||||
|
add to a covered work, you may (if authorized by the copyright holders of
|
||||||
|
that material) supplement the terms of this License with terms:
|
||||||
|
|
||||||
|
a) Disclaiming warranty or limiting liability differently from the
|
||||||
|
terms of sections 15 and 16 of this License; or
|
||||||
|
|
||||||
|
b) Requiring preservation of specified reasonable legal notices or
|
||||||
|
author attributions in that material or in the Appropriate Legal
|
||||||
|
Notices displayed by works containing it; or
|
||||||
|
|
||||||
|
c) Prohibiting misrepresentation of the origin of that material, or
|
||||||
|
requiring that modified versions of such material be marked in
|
||||||
|
reasonable ways as different from the original version; or
|
||||||
|
|
||||||
|
d) Limiting the use for publicity purposes of names of licensors or
|
||||||
|
authors of the material; or
|
||||||
|
|
||||||
|
e) Declining to grant rights under trademark law for use of some
|
||||||
|
trade names, trademarks, or service marks; or
|
||||||
|
|
||||||
|
f) Requiring indemnification of licensors and authors of that
|
||||||
|
material by anyone who conveys the material (or modified versions of
|
||||||
|
it) with contractual assumptions of liability to the recipient, for
|
||||||
|
any liability that these contractual assumptions directly impose on
|
||||||
|
those licensors and authors.
|
||||||
|
|
||||||
|
All other non-permissive additional terms are considered "further
|
||||||
|
restrictions" within the meaning of section 10. If the Program as you
|
||||||
|
received it, or any part of it, contains a notice stating that it is
|
||||||
|
governed by this License along with a term that is a further
|
||||||
|
restriction, you may remove that term. If a license document contains
|
||||||
|
a further restriction but permits relicensing or conveying under this
|
||||||
|
License, you may add to a covered work material governed by the terms
|
||||||
|
of that license document, provided that the further restriction does
|
||||||
|
not survive such relicensing or conveying.
|
||||||
|
|
||||||
|
If you add terms to a covered work in accord with this section, you
|
||||||
|
must place, in the relevant source files, a statement of the
|
||||||
|
additional terms that apply to those files, or a notice indicating
|
||||||
|
where to find the applicable terms.
|
||||||
|
|
||||||
|
Additional terms, permissive or non-permissive, may be stated in the
|
||||||
|
form of a separately written license, or stated as exceptions;
|
||||||
|
the above requirements apply either way.
|
||||||
|
|
||||||
|
8. Termination.
|
||||||
|
|
||||||
|
You may not propagate or modify a covered work except as expressly
|
||||||
|
provided under this License. Any attempt otherwise to propagate or
|
||||||
|
modify it is void, and will automatically terminate your rights under
|
||||||
|
this License (including any patent licenses granted under the third
|
||||||
|
paragraph of section 11).
|
||||||
|
|
||||||
|
However, if you cease all violation of this License, then your
|
||||||
|
license from a particular copyright holder is reinstated (a)
|
||||||
|
provisionally, unless and until the copyright holder explicitly and
|
||||||
|
finally terminates your license, and (b) permanently, if the copyright
|
||||||
|
holder fails to notify you of the violation by some reasonable means
|
||||||
|
prior to 60 days after the cessation.
|
||||||
|
|
||||||
|
Moreover, your license from a particular copyright holder is
|
||||||
|
reinstated permanently if the copyright holder notifies you of the
|
||||||
|
violation by some reasonable means, this is the first time you have
|
||||||
|
received notice of violation of this License (for any work) from that
|
||||||
|
copyright holder, and you cure the violation prior to 30 days after
|
||||||
|
your receipt of the notice.
|
||||||
|
|
||||||
|
Termination of your rights under this section does not terminate the
|
||||||
|
licenses of parties who have received copies or rights from you under
|
||||||
|
this License. If your rights have been terminated and not permanently
|
||||||
|
reinstated, you do not qualify to receive new licenses for the same
|
||||||
|
material under section 10.
|
||||||
|
|
||||||
|
9. Acceptance Not Required for Having Copies.
|
||||||
|
|
||||||
|
You are not required to accept this License in order to receive or
|
||||||
|
run a copy of the Program. Ancillary propagation of a covered work
|
||||||
|
occurring solely as a consequence of using peer-to-peer transmission
|
||||||
|
to receive a copy likewise does not require acceptance. However,
|
||||||
|
nothing other than this License grants you permission to propagate or
|
||||||
|
modify any covered work. These actions infringe copyright if you do
|
||||||
|
not accept this License. Therefore, by modifying or propagating a
|
||||||
|
covered work, you indicate your acceptance of this License to do so.
|
||||||
|
|
||||||
|
10. Automatic Licensing of Downstream Recipients.
|
||||||
|
|
||||||
|
Each time you convey a covered work, the recipient automatically
|
||||||
|
receives a license from the original licensors, to run, modify and
|
||||||
|
propagate that work, subject to this License. You are not responsible
|
||||||
|
for enforcing compliance by third parties with this License.
|
||||||
|
|
||||||
|
An "entity transaction" is a transaction transferring control of an
|
||||||
|
organization, or substantially all assets of one, or subdividing an
|
||||||
|
organization, or merging organizations. If propagation of a covered
|
||||||
|
work results from an entity transaction, each party to that
|
||||||
|
transaction who receives a copy of the work also receives whatever
|
||||||
|
licenses to the work the party's predecessor in interest had or could
|
||||||
|
give under the previous paragraph, plus a right to possession of the
|
||||||
|
Corresponding Source of the work from the predecessor in interest, if
|
||||||
|
the predecessor has it or can get it with reasonable efforts.
|
||||||
|
|
||||||
|
You may not impose any further restrictions on the exercise of the
|
||||||
|
rights granted or affirmed under this License. For example, you may
|
||||||
|
not impose a license fee, royalty, or other charge for exercise of
|
||||||
|
rights granted under this License, and you may not initiate litigation
|
||||||
|
(including a cross-claim or counterclaim in a lawsuit) alleging that
|
||||||
|
any patent claim is infringed by making, using, selling, offering for
|
||||||
|
sale, or importing the Program or any portion of it.
|
||||||
|
|
||||||
|
11. Patents.
|
||||||
|
|
||||||
|
A "contributor" is a copyright holder who authorizes use under this
|
||||||
|
License of the Program or a work on which the Program is based. The
|
||||||
|
work thus licensed is called the contributor's "contributor version".
|
||||||
|
|
||||||
|
A contributor's "essential patent claims" are all patent claims
|
||||||
|
owned or controlled by the contributor, whether already acquired or
|
||||||
|
hereafter acquired, that would be infringed by some manner, permitted
|
||||||
|
by this License, of making, using, or selling its contributor version,
|
||||||
|
but do not include claims that would be infringed only as a
|
||||||
|
consequence of further modification of the contributor version. For
|
||||||
|
purposes of this definition, "control" includes the right to grant
|
||||||
|
patent sublicenses in a manner consistent with the requirements of
|
||||||
|
this License.
|
||||||
|
|
||||||
|
Each contributor grants you a non-exclusive, worldwide, royalty-free
|
||||||
|
patent license under the contributor's essential patent claims, to
|
||||||
|
make, use, sell, offer for sale, import and otherwise run, modify and
|
||||||
|
propagate the contents of its contributor version.
|
||||||
|
|
||||||
|
In the following three paragraphs, a "patent license" is any express
|
||||||
|
agreement or commitment, however denominated, not to enforce a patent
|
||||||
|
(such as an express permission to practice a patent or covenant not to
|
||||||
|
sue for patent infringement). To "grant" such a patent license to a
|
||||||
|
party means to make such an agreement or commitment not to enforce a
|
||||||
|
patent against the party.
|
||||||
|
|
||||||
|
If you convey a covered work, knowingly relying on a patent license,
|
||||||
|
and the Corresponding Source of the work is not available for anyone
|
||||||
|
to copy, free of charge and under the terms of this License, through a
|
||||||
|
publicly available network server or other readily accessible means,
|
||||||
|
then you must either (1) cause the Corresponding Source to be so
|
||||||
|
available, or (2) arrange to deprive yourself of the benefit of the
|
||||||
|
patent license for this particular work, or (3) arrange, in a manner
|
||||||
|
consistent with the requirements of this License, to extend the patent
|
||||||
|
license to downstream recipients. "Knowingly relying" means you have
|
||||||
|
actual knowledge that, but for the patent license, your conveying the
|
||||||
|
covered work in a country, or your recipient's use of the covered work
|
||||||
|
in a country, would infringe one or more identifiable patents in that
|
||||||
|
country that you have reason to believe are valid.
|
||||||
|
|
||||||
|
If, pursuant to or in connection with a single transaction or
|
||||||
|
arrangement, you convey, or propagate by procuring conveyance of, a
|
||||||
|
covered work, and grant a patent license to some of the parties
|
||||||
|
receiving the covered work authorizing them to use, propagate, modify
|
||||||
|
or convey a specific copy of the covered work, then the patent license
|
||||||
|
you grant is automatically extended to all recipients of the covered
|
||||||
|
work and works based on it.
|
||||||
|
|
||||||
|
A patent license is "discriminatory" if it does not include within
|
||||||
|
the scope of its coverage, prohibits the exercise of, or is
|
||||||
|
conditioned on the non-exercise of one or more of the rights that are
|
||||||
|
specifically granted under this License. You may not convey a covered
|
||||||
|
work if you are a party to an arrangement with a third party that is
|
||||||
|
in the business of distributing software, under which you make payment
|
||||||
|
to the third party based on the extent of your activity of conveying
|
||||||
|
the work, and under which the third party grants, to any of the
|
||||||
|
parties who would receive the covered work from you, a discriminatory
|
||||||
|
patent license (a) in connection with copies of the covered work
|
||||||
|
conveyed by you (or copies made from those copies), or (b) primarily
|
||||||
|
for and in connection with specific products or compilations that
|
||||||
|
contain the covered work, unless you entered into that arrangement,
|
||||||
|
or that patent license was granted, prior to 28 March 2007.
|
||||||
|
|
||||||
|
Nothing in this License shall be construed as excluding or limiting
|
||||||
|
any implied license or other defenses to infringement that may
|
||||||
|
otherwise be available to you under applicable patent law.
|
||||||
|
|
||||||
|
12. No Surrender of Others' Freedom.
|
||||||
|
|
||||||
|
If conditions are imposed on you (whether by court order, agreement or
|
||||||
|
otherwise) that contradict the conditions of this License, they do not
|
||||||
|
excuse you from the conditions of this License. If you cannot convey a
|
||||||
|
covered work so as to satisfy simultaneously your obligations under this
|
||||||
|
License and any other pertinent obligations, then as a consequence you may
|
||||||
|
not convey it at all. For example, if you agree to terms that obligate you
|
||||||
|
to collect a royalty for further conveying from those to whom you convey
|
||||||
|
the Program, the only way you could satisfy both those terms and this
|
||||||
|
License would be to refrain entirely from conveying the Program.
|
||||||
|
|
||||||
|
13. Use with the GNU Affero General Public License.
|
||||||
|
|
||||||
|
Notwithstanding any other provision of this License, you have
|
||||||
|
permission to link or combine any covered work with a work licensed
|
||||||
|
under version 3 of the GNU Affero General Public License into a single
|
||||||
|
combined work, and to convey the resulting work. The terms of this
|
||||||
|
License will continue to apply to the part which is the covered work,
|
||||||
|
but the special requirements of the GNU Affero General Public License,
|
||||||
|
section 13, concerning interaction through a network will apply to the
|
||||||
|
combination as such.
|
||||||
|
|
||||||
|
14. Revised Versions of this License.
|
||||||
|
|
||||||
|
The Free Software Foundation may publish revised and/or new versions of
|
||||||
|
the GNU General Public License from time to time. Such new versions will
|
||||||
|
be similar in spirit to the present version, but may differ in detail to
|
||||||
|
address new problems or concerns.
|
||||||
|
|
||||||
|
Each version is given a distinguishing version number. If the
|
||||||
|
Program specifies that a certain numbered version of the GNU General
|
||||||
|
Public License "or any later version" applies to it, you have the
|
||||||
|
option of following the terms and conditions either of that numbered
|
||||||
|
version or of any later version published by the Free Software
|
||||||
|
Foundation. If the Program does not specify a version number of the
|
||||||
|
GNU General Public License, you may choose any version ever published
|
||||||
|
by the Free Software Foundation.
|
||||||
|
|
||||||
|
If the Program specifies that a proxy can decide which future
|
||||||
|
versions of the GNU General Public License can be used, that proxy's
|
||||||
|
public statement of acceptance of a version permanently authorizes you
|
||||||
|
to choose that version for the Program.
|
||||||
|
|
||||||
|
Later license versions may give you additional or different
|
||||||
|
permissions. However, no additional obligations are imposed on any
|
||||||
|
author or copyright holder as a result of your choosing to follow a
|
||||||
|
later version.
|
||||||
|
|
||||||
|
15. Disclaimer of Warranty.
|
||||||
|
|
||||||
|
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
|
||||||
|
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
|
||||||
|
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
|
||||||
|
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
|
||||||
|
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
|
||||||
|
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
|
||||||
|
|
||||||
|
16. Limitation of Liability.
|
||||||
|
|
||||||
|
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||||
|
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
|
||||||
|
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
|
||||||
|
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
|
||||||
|
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
|
||||||
|
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
|
||||||
|
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
|
||||||
|
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
|
||||||
|
SUCH DAMAGES.
|
||||||
|
|
||||||
|
17. Interpretation of Sections 15 and 16.
|
||||||
|
|
||||||
|
If the disclaimer of warranty and limitation of liability provided
|
||||||
|
above cannot be given local legal effect according to their terms,
|
||||||
|
reviewing courts shall apply local law that most closely approximates
|
||||||
|
an absolute waiver of all civil liability in connection with the
|
||||||
|
Program, unless a warranty or assumption of liability accompanies a
|
||||||
|
copy of the Program in return for a fee.
|
||||||
|
|
||||||
|
END OF TERMS AND CONDITIONS
|
||||||
|
|
||||||
|
How to Apply These Terms to Your New Programs
|
||||||
|
|
||||||
|
If you develop a new program, and you want it to be of the greatest
|
||||||
|
possible use to the public, the best way to achieve this is to make it
|
||||||
|
free software which everyone can redistribute and change under these terms.
|
||||||
|
|
||||||
|
To do so, attach the following notices to the program. It is safest
|
||||||
|
to attach them to the start of each source file to most effectively
|
||||||
|
state the exclusion of warranty; and each file should have at least
|
||||||
|
the "copyright" line and a pointer to where the full notice is found.
|
||||||
|
|
||||||
|
<one line to give the program's name and a brief idea of what it does.>
|
||||||
|
Copyright (C) <year> <name of author>
|
||||||
|
|
||||||
|
This program is free software: you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation, either version 3 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
|
||||||
|
Also add information on how to contact you by electronic and paper mail.
|
||||||
|
|
||||||
|
If the program does terminal interaction, make it output a short
|
||||||
|
notice like this when it starts in an interactive mode:
|
||||||
|
|
||||||
|
<program> Copyright (C) <year> <name of author>
|
||||||
|
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||||
|
This is free software, and you are welcome to redistribute it
|
||||||
|
under certain conditions; type `show c' for details.
|
||||||
|
|
||||||
|
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||||
|
parts of the General Public License. Of course, your program's commands
|
||||||
|
might be different; for a GUI interface, you would use an "about box".
|
||||||
|
|
||||||
|
You should also get your employer (if you work as a programmer) or school,
|
||||||
|
if any, to sign a "copyright disclaimer" for the program, if necessary.
|
||||||
|
For more information on this, and how to apply and follow the GNU GPL, see
|
||||||
|
<http://www.gnu.org/licenses/>.
|
||||||
|
|
||||||
|
The GNU General Public License does not permit incorporating your program
|
||||||
|
into proprietary programs. If your program is a subroutine library, you
|
||||||
|
may consider it more useful to permit linking proprietary applications with
|
||||||
|
the library. If this is what you want to do, use the GNU Lesser General
|
||||||
|
Public License instead of this License. But first, please read
|
||||||
|
<http://www.gnu.org/philosophy/why-not-lgpl.html>.
|
|
@ -0,0 +1,33 @@
|
||||||
|
#ifndef CHINESESEGMENTATIONPRIVATE_H
|
||||||
|
#define CHINESESEGMENTATIONPRIVATE_H
|
||||||
|
|
||||||
|
#include "chinese-segmentation.h"
|
||||||
|
#include "cppjieba/Jieba.hpp"
|
||||||
|
#include "cppjieba/KeywordExtractor.hpp"
|
||||||
|
|
||||||
|
class ChineseSegmentationPrivate
|
||||||
|
{
|
||||||
|
public:
|
||||||
|
explicit ChineseSegmentationPrivate(ChineseSegmentation *parent = nullptr);
|
||||||
|
~ChineseSegmentationPrivate();
|
||||||
|
vector<KeyWord> callSegment(const string& sentence);
|
||||||
|
|
||||||
|
vector<string> callMixSegmentCutStr(const string& sentence);
|
||||||
|
vector<Word> callMixSegmentCutWord(const string& sentence);
|
||||||
|
string lookUpTagOfWord(const string& word);
|
||||||
|
vector<pair<string, string>> getTagOfWordsInSentence(const string &sentence);
|
||||||
|
|
||||||
|
vector<Word> callFullSegment(const string& sentence);
|
||||||
|
|
||||||
|
vector<Word> callQuerySegment(const string& sentence);
|
||||||
|
|
||||||
|
vector<Word> callHMMSegment(const string& sentence);
|
||||||
|
|
||||||
|
vector<Word> callMPSegment(const string& sentence);
|
||||||
|
|
||||||
|
private:
|
||||||
|
cppjieba::Jieba *m_jieba;
|
||||||
|
ChineseSegmentation *q = nullptr;
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif // CHINESESEGMENTATIONPRIVATE_H
|
|
@ -0,0 +1,162 @@
|
||||||
|
/*
|
||||||
|
* Copyright (C) 2020, KylinSoft Co., Ltd.
|
||||||
|
*
|
||||||
|
* This program is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This program is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||||
|
*
|
||||||
|
* Authors: zhangzihao <zhangzihao@kylinos.cn>
|
||||||
|
* Modified by: zhangpengfei <zhangpengfei@kylinos.cn>
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
#include "chinese-segmentation.h"
|
||||||
|
#include "chinese-segmentation-private.h"
|
||||||
|
|
||||||
|
ChineseSegmentationPrivate::ChineseSegmentationPrivate(ChineseSegmentation *parent) : q(parent)
|
||||||
|
{
|
||||||
|
//const char * const DICT_PATH = "/usr/share/ukui-search/res/dict/jieba.dict.utf8";
|
||||||
|
const char * const HMM_PATH = "/usr/share/ukui-search/res/dict/hmm_model.utf8";
|
||||||
|
//const char * const USER_DICT_PATH = "/usr/share/ukui-search/res/dict/user.dict.utf8";
|
||||||
|
//const char * const IDF_PATH = "/usr/share/ukui-search/res/dict/idf.utf8";
|
||||||
|
const char * const STOP_WORD_PATH = "/usr/share/ukui-search/res/dict/stop_words.utf8";
|
||||||
|
m_jieba = new cppjieba::Jieba(DICT_PATH,
|
||||||
|
HMM_PATH,
|
||||||
|
USER_DICT_PATH,
|
||||||
|
IDF_DICT_PATH,
|
||||||
|
STOP_WORD_PATH,
|
||||||
|
"");
|
||||||
|
}
|
||||||
|
|
||||||
|
ChineseSegmentationPrivate::~ChineseSegmentationPrivate() {
|
||||||
|
if(m_jieba)
|
||||||
|
delete m_jieba;
|
||||||
|
m_jieba = nullptr;
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<KeyWord> ChineseSegmentationPrivate::callSegment(const string &sentence) {
|
||||||
|
const size_t topk = -1;
|
||||||
|
vector<KeyWord> keywordres;
|
||||||
|
ChineseSegmentationPrivate::m_jieba->extractor.Extract(sentence, keywordres, topk);
|
||||||
|
|
||||||
|
return keywordres;
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<string> ChineseSegmentationPrivate::callMixSegmentCutStr(const string &sentence)
|
||||||
|
{
|
||||||
|
vector<string> keywordres;
|
||||||
|
ChineseSegmentationPrivate::m_jieba->Cut(sentence, keywordres);
|
||||||
|
return keywordres;
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<Word> ChineseSegmentationPrivate::callMixSegmentCutWord(const string &sentence)
|
||||||
|
{
|
||||||
|
vector<Word> keywordres;
|
||||||
|
ChineseSegmentationPrivate::m_jieba->Cut(sentence, keywordres);
|
||||||
|
return keywordres;
|
||||||
|
}
|
||||||
|
|
||||||
|
string ChineseSegmentationPrivate::lookUpTagOfWord(const string &word)
|
||||||
|
{
|
||||||
|
return ChineseSegmentationPrivate::m_jieba->LookupTag(word);
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<pair<string, string>> ChineseSegmentationPrivate::getTagOfWordsInSentence(const string &sentence)
|
||||||
|
{
|
||||||
|
vector<pair<string, string>> words;
|
||||||
|
ChineseSegmentationPrivate::m_jieba->Tag(sentence, words);
|
||||||
|
return words;
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<Word> ChineseSegmentationPrivate::callFullSegment(const string &sentence)
|
||||||
|
{
|
||||||
|
vector<Word> keywordres;
|
||||||
|
ChineseSegmentationPrivate::m_jieba->CutAll(sentence, keywordres);
|
||||||
|
return keywordres;
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<Word> ChineseSegmentationPrivate::callQuerySegment(const string &sentence)
|
||||||
|
{
|
||||||
|
vector<Word> keywordres;
|
||||||
|
ChineseSegmentationPrivate::m_jieba->CutForSearch(sentence, keywordres);
|
||||||
|
return keywordres;
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<Word> ChineseSegmentationPrivate::callHMMSegment(const string &sentence)
|
||||||
|
{
|
||||||
|
vector<Word> keywordres;
|
||||||
|
ChineseSegmentationPrivate::m_jieba->CutHMM(sentence, keywordres);
|
||||||
|
return keywordres;
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<Word> ChineseSegmentationPrivate::callMPSegment(const string &sentence)
|
||||||
|
{
|
||||||
|
size_t maxWordLen = 512;
|
||||||
|
vector<Word> keywordres;
|
||||||
|
ChineseSegmentationPrivate::m_jieba->CutSmall(sentence, keywordres, maxWordLen);
|
||||||
|
return keywordres;
|
||||||
|
}
|
||||||
|
|
||||||
|
ChineseSegmentation *ChineseSegmentation::getInstance()
|
||||||
|
{
|
||||||
|
static ChineseSegmentation *global_instance_chinese_segmentation = new ChineseSegmentation;
|
||||||
|
return global_instance_chinese_segmentation;
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<KeyWord> ChineseSegmentation::callSegment(const string &sentence)
|
||||||
|
{
|
||||||
|
return d->callSegment(sentence);
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<string> ChineseSegmentation::callMixSegmentCutStr(const string &sentence)
|
||||||
|
{
|
||||||
|
return d->callMixSegmentCutStr(sentence);
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<Word> ChineseSegmentation::callMixSegmentCutWord(const string &str)
|
||||||
|
{
|
||||||
|
return d->callMixSegmentCutWord(str);
|
||||||
|
}
|
||||||
|
|
||||||
|
string ChineseSegmentation::lookUpTagOfWord(const string &word)
|
||||||
|
{
|
||||||
|
return d->lookUpTagOfWord(word);
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<pair<string, string> > ChineseSegmentation::getTagOfWordsInSentence(const string &sentence)
|
||||||
|
{
|
||||||
|
return d->getTagOfWordsInSentence(sentence);
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<Word> ChineseSegmentation::callFullSegment(const string &sentence)
|
||||||
|
{
|
||||||
|
return d->callFullSegment(sentence);
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<Word> ChineseSegmentation::callQuerySegment(const string &sentence)
|
||||||
|
{
|
||||||
|
return d->callQuerySegment(sentence);
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<Word> ChineseSegmentation::callHMMSegment(const string &sentence)
|
||||||
|
{
|
||||||
|
return d->callHMMSegment(sentence);
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<Word> ChineseSegmentation::callMPSegment(const string &sentence)
|
||||||
|
{
|
||||||
|
return d->callMPSegment(sentence);
|
||||||
|
}
|
||||||
|
|
||||||
|
ChineseSegmentation::ChineseSegmentation() : d(new ChineseSegmentationPrivate)
|
||||||
|
{
|
||||||
|
}
|
|
@ -0,0 +1,116 @@
|
||||||
|
/*
|
||||||
|
* Copyright (C) 2020, KylinSoft Co., Ltd.
|
||||||
|
*
|
||||||
|
* This program is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This program is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||||
|
*
|
||||||
|
* Authors: zhangzihao <zhangzihao@kylinos.cn>
|
||||||
|
* Modified by: zhangpengfei <zhangpengfei@kylinos.cn>
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
#ifndef CHINESESEGMENTATION_H
|
||||||
|
#define CHINESESEGMENTATION_H
|
||||||
|
|
||||||
|
#include "libchinese-segmentation_global.h"
|
||||||
|
#include "common-struct.h"
|
||||||
|
|
||||||
|
class ChineseSegmentationPrivate;
|
||||||
|
class CHINESESEGMENTATION_EXPORT ChineseSegmentation {
|
||||||
|
public:
|
||||||
|
static ChineseSegmentation *getInstance();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief ChineseSegmentation::callSegment
|
||||||
|
* 调用extractor进行关键词提取,先使用Mix方式初步分词,再使用Idf词典进行关键词提取,只包含两字以上关键词
|
||||||
|
*
|
||||||
|
* @param sentence 要提取关键词的句子
|
||||||
|
* @return vector<KeyWord> 存放提取后关键词的信息的容器
|
||||||
|
*/
|
||||||
|
vector<KeyWord> callSegment(const string &sentence);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief ChineseSegmentation::callMixSegmentCutStr
|
||||||
|
* 使用Mix方法进行分词,即先使用最大概率法MP初步分词,再用隐式马尔科夫模型HMM进一步分词,可以准确切出词典已有词和未登录词,结果比较准确
|
||||||
|
*
|
||||||
|
* @param sentence 要分词的句子
|
||||||
|
* @return vector<string> 只存放分词后每个词的内容的容器
|
||||||
|
*/
|
||||||
|
vector<string> callMixSegmentCutStr(const string& sentence);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief ChineseSegmentation::callMixSegmentCutWord
|
||||||
|
* 和callMixSegmentCutStr功能相同
|
||||||
|
* @param sentence 要分词的句子
|
||||||
|
* @return vector<Word> 存放分词后每个词所有信息的容器
|
||||||
|
*/
|
||||||
|
vector<Word> callMixSegmentCutWord(const string& str);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief ChineseSegmentation::lookUpTagOfWord
|
||||||
|
* 查询word的词性
|
||||||
|
* @param word 要查询词性的词
|
||||||
|
* @return string word的词性
|
||||||
|
*/
|
||||||
|
string lookUpTagOfWord(const string& word);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief ChineseSegmentation::getTagOfWordsInSentence
|
||||||
|
* 使用Mix分词后获取每个词的词性
|
||||||
|
* @param sentence 要分词的句子
|
||||||
|
* @return vector<pair<string, string>> 分词后的每个词的内容(firsr)和其对应的词性(second)
|
||||||
|
*/
|
||||||
|
vector<pair<string, string>> getTagOfWordsInSentence(const string &sentence);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief ChineseSegmentation::callFullSegment
|
||||||
|
* 使用Full进行分词,Full会切出字典里所有的词。
|
||||||
|
* @param sentence 要分词的句子
|
||||||
|
* @return vector<Word> 存放分词后每个词所有信息的容器
|
||||||
|
*/
|
||||||
|
vector<Word> callFullSegment(const string& sentence);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief ChineseSegmentation::callQuerySegment
|
||||||
|
* 使用Query进行分词,即先使用Mix,对于长词再用Full,结果最精确,但词的数量也最大
|
||||||
|
* @param sentence 要分词的句子
|
||||||
|
* @return vector<Word> 存放分词后每个词所有信息的容器
|
||||||
|
*/
|
||||||
|
vector<Word> callQuerySegment(const string& sentence);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief ChineseSegmentation::callHMMSegment
|
||||||
|
* 使用隐式马尔科夫模型HMM进行分词
|
||||||
|
* @param sentence 要分词的句子
|
||||||
|
* @return vector<Word> 存放分词后每个词所有信息的容器
|
||||||
|
*/
|
||||||
|
vector<Word> callHMMSegment(const string& sentence);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief ChineseSegmentation::callMPSegment
|
||||||
|
* 使用最大概率法MP进行分词
|
||||||
|
* @param sentence 要分词的句子
|
||||||
|
* @return vector<Word> 存放分词后每个词所有信息的容器
|
||||||
|
*/
|
||||||
|
vector<Word> callMPSegment(const string& sentence);
|
||||||
|
|
||||||
|
private:
|
||||||
|
explicit ChineseSegmentation();
|
||||||
|
~ChineseSegmentation() = default;
|
||||||
|
ChineseSegmentation(const ChineseSegmentation&) = delete;
|
||||||
|
ChineseSegmentation& operator =(const ChineseSegmentation&) = delete;
|
||||||
|
|
||||||
|
private:
|
||||||
|
ChineseSegmentationPrivate *d = nullptr;
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif // CHINESESEGMENTATION_H
|
|
@ -0,0 +1,52 @@
|
||||||
|
#ifndef COMMONSTRUCT_H
|
||||||
|
#define COMMONSTRUCT_H
|
||||||
|
|
||||||
|
#include <string>
|
||||||
|
#include <vector>
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief The KeyWord struct
|
||||||
|
*
|
||||||
|
* @property word the content of keyword
|
||||||
|
* @property offsets the Unicode offsets, can be used to check the word pos in a sentence
|
||||||
|
* @property weight the weight of the keyword
|
||||||
|
*/
|
||||||
|
|
||||||
|
struct KeyWord {
|
||||||
|
string word;
|
||||||
|
vector<size_t> offsets;
|
||||||
|
double weight;
|
||||||
|
~KeyWord() {
|
||||||
|
word = std::move("");
|
||||||
|
offsets.clear();
|
||||||
|
offsets.shrink_to_fit();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief The Word struct
|
||||||
|
*
|
||||||
|
* @property word the content of word
|
||||||
|
* @property offset the offset of the word(absolute pos, Chinese 3 , English 1), can be used to check the word pos in a sentence
|
||||||
|
* @property unicode_offset the Unicode offset of the word
|
||||||
|
* @property unicode_length the Unicode length of the word
|
||||||
|
*/
|
||||||
|
struct Word {
|
||||||
|
string word;
|
||||||
|
uint32_t offset;
|
||||||
|
uint32_t unicode_offset;
|
||||||
|
uint32_t unicode_length;
|
||||||
|
Word(const string& w, uint32_t o)
|
||||||
|
: word(w), offset(o) {
|
||||||
|
}
|
||||||
|
Word(const string& w, uint32_t o, uint32_t unicode_offset, uint32_t unicode_length)
|
||||||
|
: word(w), offset(o), unicode_offset(unicode_offset), unicode_length(unicode_length) {
|
||||||
|
}
|
||||||
|
~Word() {
|
||||||
|
word = std::move("");
|
||||||
|
}
|
||||||
|
}; // struct Word
|
||||||
|
|
||||||
|
#endif // COMMONSTRUCT_H
|
|
@ -0,0 +1,641 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <unistd.h>
|
||||||
|
#include <fcntl.h>
|
||||||
|
#include <sys/mman.h>
|
||||||
|
#include <sys/types.h>
|
||||||
|
#include <sys/stat.h>
|
||||||
|
#include <QDebug>
|
||||||
|
|
||||||
|
#include <algorithm>
|
||||||
|
#include <utility>
|
||||||
|
|
||||||
|
#include "limonp/Md5.hpp"
|
||||||
|
#include "Unicode.hpp"
|
||||||
|
//#define USE_DARTS_CLONE
|
||||||
|
#ifdef USE_DARTS_CLONE
|
||||||
|
#include "../storage-base/darts-clone/darts.h"
|
||||||
|
#else
|
||||||
|
#include "../storage-base/cedar/cedar.h"
|
||||||
|
#endif
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
|
||||||
|
using std::pair;
|
||||||
|
|
||||||
|
struct DatElement {
|
||||||
|
string word;
|
||||||
|
string tag;
|
||||||
|
double weight = 0;
|
||||||
|
|
||||||
|
bool operator < (const DatElement & b) const {
|
||||||
|
if (word == b.word) {
|
||||||
|
return this->weight > b.weight;
|
||||||
|
}
|
||||||
|
|
||||||
|
return this->word < b.word;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
struct IdfElement {
|
||||||
|
string word;
|
||||||
|
double idf = 0;
|
||||||
|
|
||||||
|
bool operator < (const IdfElement & b) const {
|
||||||
|
if (word == b.word) {
|
||||||
|
return this->idf > b.idf;
|
||||||
|
}
|
||||||
|
|
||||||
|
return this->word < b.word;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
struct PinYinElement
|
||||||
|
{
|
||||||
|
string word;
|
||||||
|
string tag;
|
||||||
|
|
||||||
|
bool operator < (const DatElement & b) const {
|
||||||
|
return this->word < b.word;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
inline std::ostream & operator << (std::ostream& os, const DatElement & elem) {
|
||||||
|
return os << "word=" << elem.word << "/tag=" << elem.tag << "/weight=" << elem.weight;
|
||||||
|
}
|
||||||
|
|
||||||
|
struct PinYinMemElem {
|
||||||
|
char tag[6] = {};
|
||||||
|
|
||||||
|
void SetTag(const string & str) {
|
||||||
|
memset(&tag[0], 0, sizeof(tag));
|
||||||
|
strncpy(&tag[0], str.c_str(), std::min(str.size(), sizeof(tag) - 1));
|
||||||
|
}
|
||||||
|
|
||||||
|
string GetTag() const {
|
||||||
|
return &tag[0];
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
inline std::ostream & operator << (std::ostream& os, const DatMemElem & elem) {
|
||||||
|
return os << "/tag=" << elem.GetTag() << "/weight=" << elem.weight;
|
||||||
|
}
|
||||||
|
#ifdef USE_DARTS_CLONE
|
||||||
|
typedef Darts::DoubleArray JiebaDAT;
|
||||||
|
#else
|
||||||
|
typedef cedar::da<int, -1, -2, false> JiebaDAT;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
struct CacheFileHeader {
|
||||||
|
char md5_hex[32] = {};
|
||||||
|
double min_weight = 0;
|
||||||
|
uint32_t elements_num = 0;
|
||||||
|
uint32_t dat_size = 0;
|
||||||
|
};
|
||||||
|
|
||||||
|
static_assert(sizeof(DatMemElem) == 16, "DatMemElem length invalid");
|
||||||
|
static_assert((sizeof(CacheFileHeader) % sizeof(DatMemElem)) == 0, "DatMemElem CacheFileHeader length equal");
|
||||||
|
|
||||||
|
|
||||||
|
class DatTrie {
|
||||||
|
public:
|
||||||
|
DatTrie() {}
|
||||||
|
~DatTrie() {
|
||||||
|
::munmap(mmap_addr_, mmap_length_);
|
||||||
|
mmap_addr_ = nullptr;
|
||||||
|
mmap_length_ = 0;
|
||||||
|
|
||||||
|
::close(mmap_fd_);
|
||||||
|
mmap_fd_ = -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
const DatMemElem * Find(const string & key) const {
|
||||||
|
#ifdef USE_DARTS_CLONE
|
||||||
|
JiebaDAT::result_pair_type find_result;
|
||||||
|
dat_.exactMatchSearch(key.c_str(), find_result);
|
||||||
|
|
||||||
|
if ((0 == find_result.length) || (find_result.value < 0) || ((size_t)find_result.value >= elements_num_)) {
|
||||||
|
return nullptr;
|
||||||
|
}
|
||||||
|
|
||||||
|
return &elements_ptr_[ find_result.value ];
|
||||||
|
#else
|
||||||
|
int result = dat_.exactMatchSearch<int>(key.c_str());
|
||||||
|
if (result < 0)
|
||||||
|
return nullptr;
|
||||||
|
return &elements_ptr_[result];
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
const double Find(const string & key, std::size_t length, std::size_t node_pos) const {
|
||||||
|
#ifdef USE_DARTS_CLONE
|
||||||
|
JiebaDAT::result_pair_type find_result;
|
||||||
|
dat_.exactMatchSearch(key.c_str(), find_result, length, node_pos);
|
||||||
|
|
||||||
|
if ((0 == find_result.length) || (find_result.value < 0) || ((size_t)find_result.value >= elements_num_)) {
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
return idf_elements_ptr_[ find_result.value ];
|
||||||
|
#else
|
||||||
|
int result = dat_.exactMatchSearch<int>(key.c_str(), length, node_pos);
|
||||||
|
if (result < 0)
|
||||||
|
return -1;
|
||||||
|
return idf_elements_ptr_[result];
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
const PinYinMemElem * PinYinFind(const string & key) const {
|
||||||
|
#ifdef USE_DARTS_CLONE
|
||||||
|
JiebaDAT::result_pair_type find_result;
|
||||||
|
dat_.exactMatchSearch(key.c_str(), find_result);
|
||||||
|
|
||||||
|
if ((0 == find_result.length) || (find_result.value < 0) || ((size_t)find_result.value >= elements_num_)) {
|
||||||
|
return nullptr;
|
||||||
|
}
|
||||||
|
|
||||||
|
return &pinyin_elements_ptr_[ find_result.value ];
|
||||||
|
#else
|
||||||
|
int result = dat_.exactMatchSearch<int>(key.c_str());
|
||||||
|
if (result < 0)
|
||||||
|
return nullptr;
|
||||||
|
return &pinyin_elements_ptr_[result];
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
void Find(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end,
|
||||||
|
vector<struct DatDag>&res, size_t max_word_len) const {
|
||||||
|
|
||||||
|
res.clear();
|
||||||
|
res.resize(end - begin);
|
||||||
|
|
||||||
|
string text_str;
|
||||||
|
EncodeRunesToString(begin, end, text_str);
|
||||||
|
|
||||||
|
static const size_t max_num = 128;
|
||||||
|
JiebaDAT::result_pair_type result_pairs[max_num] = {};
|
||||||
|
|
||||||
|
for (size_t i = 0, begin_pos = 0; i < size_t(end - begin); i++) {
|
||||||
|
|
||||||
|
std::size_t num_results = dat_.commonPrefixSearch(&text_str[begin_pos], &result_pairs[0], max_num);
|
||||||
|
|
||||||
|
res[i].nexts.push_back(pair<size_t, const DatMemElem *>(i + 1, nullptr));
|
||||||
|
|
||||||
|
for (std::size_t idx = 0; idx < num_results; ++idx) {
|
||||||
|
auto & match = result_pairs[idx];
|
||||||
|
|
||||||
|
if ((match.value < 0) || ((size_t)match.value >= elements_num_)) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
auto const char_num = Utf8CharNum(&text_str[begin_pos], match.length);
|
||||||
|
|
||||||
|
if (char_num > max_word_len) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
auto pValue = &elements_ptr_[match.value];
|
||||||
|
|
||||||
|
if (1 == char_num) {
|
||||||
|
res[i].nexts[0].second = pValue;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
res[i].nexts.push_back(pair<size_t, const DatMemElem *>(i + char_num, pValue));
|
||||||
|
}
|
||||||
|
|
||||||
|
begin_pos += limonp::UnicodeToUtf8Bytes((begin + i)->rune);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
void Find_Reverse(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end,
|
||||||
|
vector<struct DatDag>&res, size_t max_word_len) const {
|
||||||
|
|
||||||
|
res.clear();
|
||||||
|
res.resize(end - begin);
|
||||||
|
|
||||||
|
string text_str;
|
||||||
|
EncodeRunesToString(begin, end, text_str);
|
||||||
|
|
||||||
|
static const size_t max_num = 128;
|
||||||
|
JiebaDAT::result_pair_type result_pairs[max_num] = {};
|
||||||
|
|
||||||
|
size_t str_size = end - begin;
|
||||||
|
for (size_t i = 0, begin_pos = text_str.size(); i < str_size; i++) {
|
||||||
|
|
||||||
|
begin_pos -= (end - i - 1)->len;
|
||||||
|
std::size_t num_results = dat_.commonPrefixSearch(&text_str[begin_pos], &result_pairs[0], max_num);
|
||||||
|
res[str_size - i - 1].nexts.push_back(pair<size_t, const DatMemElem *>(str_size - i, nullptr));
|
||||||
|
|
||||||
|
for (std::size_t idx = 0; idx < num_results; ++idx) {
|
||||||
|
auto & match = result_pairs[idx];
|
||||||
|
if ((match.value < 0) || ((size_t)match.value >= elements_num_)) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
auto const char_num = Utf8CharNum(&text_str[begin_pos], match.length);
|
||||||
|
|
||||||
|
if (char_num > max_word_len) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
auto pValue = &elements_ptr_[match.value];
|
||||||
|
|
||||||
|
if (1 == char_num) {
|
||||||
|
res[str_size - i - 1].nexts[0].second = pValue;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
res[str_size - i - 1].nexts.push_back(pair<size_t, const DatMemElem *>(str_size - 1 - i + char_num, pValue));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}*/
|
||||||
|
|
||||||
|
void Find(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end,
|
||||||
|
vector<WordRange>& words, size_t max_word_len) const {
|
||||||
|
|
||||||
|
string text_str;
|
||||||
|
EncodeRunesToString(begin, end, text_str);
|
||||||
|
|
||||||
|
static const size_t max_num = 128;
|
||||||
|
JiebaDAT::result_pair_type result_pairs[max_num] = {};//存放字典查询结果
|
||||||
|
size_t str_size = end - begin;
|
||||||
|
double max_weight[str_size];//存放逆向路径最大weight
|
||||||
|
for (size_t i = 0; i<str_size; i++) {
|
||||||
|
max_weight[i] = -3.14e+100;
|
||||||
|
}
|
||||||
|
int max_next[str_size];//存放动态规划后的分词结果
|
||||||
|
//memset(max_next,-1,str_size);
|
||||||
|
|
||||||
|
double val(0);
|
||||||
|
for (size_t i = 0, begin_pos = text_str.size(); i < str_size; i++) {
|
||||||
|
size_t nextPos = str_size - i;//逆向计算
|
||||||
|
begin_pos -= (end - i - 1)->len;
|
||||||
|
|
||||||
|
std::size_t num_results = dat_.commonPrefixSearch(&text_str[begin_pos], &result_pairs[0], max_num);
|
||||||
|
if (0 == num_results) {//字典不存在则单独分词
|
||||||
|
val = min_weight_;
|
||||||
|
|
||||||
|
if (nextPos < str_size) {
|
||||||
|
val += max_weight[nextPos];
|
||||||
|
}
|
||||||
|
if ((nextPos <= str_size) && (val > max_weight[nextPos - 1])) {
|
||||||
|
max_weight[nextPos - 1] = val;
|
||||||
|
max_next[nextPos - 1] = nextPos;
|
||||||
|
}
|
||||||
|
} else {//字典存在则根据查询结果数量计算最大概率路径
|
||||||
|
for (std::size_t idx = 0; idx < num_results; ++idx) {
|
||||||
|
auto & match = result_pairs[idx];
|
||||||
|
if ((match.value < 0) || ((size_t)match.value >= elements_num_)) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
auto const char_num = Utf8CharNum(&text_str[begin_pos], match.length);
|
||||||
|
if (char_num > max_word_len) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
auto pValue = &elements_ptr_[match.value];
|
||||||
|
|
||||||
|
val = pValue->weight;
|
||||||
|
if (1 == char_num) {
|
||||||
|
if (nextPos < str_size) {
|
||||||
|
val += max_weight[nextPos];
|
||||||
|
}
|
||||||
|
if ((nextPos <= str_size) && (val > max_weight[nextPos - 1])) {
|
||||||
|
max_weight[nextPos - 1] = val;
|
||||||
|
max_next[nextPos - 1] = nextPos;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
if (nextPos - 1 + char_num < str_size) {
|
||||||
|
val += max_weight[nextPos - 1 + char_num];
|
||||||
|
}
|
||||||
|
if ((nextPos - 1 + char_num <= str_size) && (val > max_weight[nextPos - 1])) {
|
||||||
|
max_weight[nextPos - 1] = val;
|
||||||
|
max_next[nextPos - 1] = nextPos - 1 + char_num;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
for (size_t i = 0; i < str_size;) {//统计动态规划结果
|
||||||
|
assert(max_next[i] > i);
|
||||||
|
assert(max_next[i] <= str_size);
|
||||||
|
WordRange wr(begin + i, begin + max_next[i] - 1);
|
||||||
|
words.push_back(wr);
|
||||||
|
i = max_next[i];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
double GetMinWeight() const {
|
||||||
|
return min_weight_;
|
||||||
|
}
|
||||||
|
|
||||||
|
void SetMinWeight(double d) {
|
||||||
|
min_weight_ = d ;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool InitBuildDat(vector<DatElement>& elements, const string & dat_cache_file, const string & md5) {
|
||||||
|
BuildDatCache(elements, dat_cache_file, md5);
|
||||||
|
return InitAttachDat(dat_cache_file, md5);
|
||||||
|
}
|
||||||
|
|
||||||
|
bool InitBuildDat(vector<IdfElement>& elements, const string & dat_cache_file, const string & md5) {
|
||||||
|
BuildDatCache(elements, dat_cache_file, md5);
|
||||||
|
return InitIdfAttachDat(dat_cache_file, md5);
|
||||||
|
}
|
||||||
|
|
||||||
|
bool InitBuildDat(vector<PinYinElement>& elements, const string & dat_cache_file, const string & md5) {
|
||||||
|
BuildDatCache(elements, dat_cache_file, md5);
|
||||||
|
return InitPinYinAttachDat(dat_cache_file, md5);
|
||||||
|
}
|
||||||
|
|
||||||
|
bool InitAttachDat(const string & dat_cache_file, const string & md5) {
|
||||||
|
mmap_fd_ = ::open(dat_cache_file.c_str(), O_RDONLY);
|
||||||
|
|
||||||
|
if (mmap_fd_ < 0) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
const auto seek_off = ::lseek(mmap_fd_, 0, SEEK_END);
|
||||||
|
assert(seek_off >= 0);
|
||||||
|
mmap_length_ = seek_off;
|
||||||
|
|
||||||
|
mmap_addr_ = reinterpret_cast<char *>(mmap(NULL, mmap_length_, PROT_READ, MAP_SHARED, mmap_fd_, 0));
|
||||||
|
assert(MAP_FAILED != mmap_addr_);
|
||||||
|
|
||||||
|
assert(mmap_length_ >= sizeof(CacheFileHeader));
|
||||||
|
CacheFileHeader & header = *reinterpret_cast<CacheFileHeader*>(mmap_addr_);
|
||||||
|
elements_num_ = header.elements_num;
|
||||||
|
min_weight_ = header.min_weight;
|
||||||
|
assert(sizeof(header.md5_hex) == md5.size());
|
||||||
|
|
||||||
|
if (0 != memcmp(&header.md5_hex[0], md5.c_str(), md5.size())) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
assert(mmap_length_ == sizeof(header) + header.elements_num * sizeof(DatMemElem) + header.dat_size * dat_.unit_size());
|
||||||
|
elements_ptr_ = (const DatMemElem *)(mmap_addr_ + sizeof(header));
|
||||||
|
char * dat_ptr = mmap_addr_ + sizeof(header) + sizeof(DatMemElem) * elements_num_;
|
||||||
|
dat_.set_array(dat_ptr, header.dat_size);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool InitIdfAttachDat(const string & dat_cache_file, const string & md5) {
|
||||||
|
mmap_fd_ = ::open(dat_cache_file.c_str(), O_RDONLY);
|
||||||
|
|
||||||
|
if (mmap_fd_ < 0) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
const auto seek_off = ::lseek(mmap_fd_, 0, SEEK_END);
|
||||||
|
assert(seek_off >= 0);
|
||||||
|
mmap_length_ = seek_off;
|
||||||
|
|
||||||
|
mmap_addr_ = reinterpret_cast<char *>(mmap(NULL, mmap_length_, PROT_READ, MAP_SHARED, mmap_fd_, 0));
|
||||||
|
assert(MAP_FAILED != mmap_addr_);
|
||||||
|
|
||||||
|
assert(mmap_length_ >= sizeof(CacheFileHeader));
|
||||||
|
CacheFileHeader & header = *reinterpret_cast<CacheFileHeader*>(mmap_addr_);
|
||||||
|
elements_num_ = header.elements_num;
|
||||||
|
min_weight_ = header.min_weight;
|
||||||
|
assert(sizeof(header.md5_hex) == md5.size());
|
||||||
|
|
||||||
|
if (0 != memcmp(&header.md5_hex[0], md5.c_str(), md5.size())) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
assert(mmap_length_ == sizeof(header) + header.elements_num * sizeof(double) + header.dat_size * dat_.unit_size());
|
||||||
|
idf_elements_ptr_ = (const double *)(mmap_addr_ + sizeof(header));
|
||||||
|
char * dat_ptr = mmap_addr_ + sizeof(header) + sizeof(double) * elements_num_;
|
||||||
|
dat_.set_array(dat_ptr, header.dat_size);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool InitPinYinAttachDat(const string & dat_cache_file, const string & md5) {
|
||||||
|
mmap_fd_ = ::open(dat_cache_file.c_str(), O_RDONLY);
|
||||||
|
|
||||||
|
if (mmap_fd_ < 0) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
const auto seek_off = ::lseek(mmap_fd_, 0, SEEK_END);
|
||||||
|
assert(seek_off >= 0);
|
||||||
|
mmap_length_ = seek_off;
|
||||||
|
|
||||||
|
mmap_addr_ = reinterpret_cast<char *>(mmap(NULL, mmap_length_, PROT_READ, MAP_SHARED, mmap_fd_, 0));
|
||||||
|
assert(MAP_FAILED != mmap_addr_);
|
||||||
|
|
||||||
|
assert(mmap_length_ >= sizeof(CacheFileHeader));
|
||||||
|
CacheFileHeader & header = *reinterpret_cast<CacheFileHeader*>(mmap_addr_);
|
||||||
|
elements_num_ = header.elements_num;
|
||||||
|
min_weight_ = header.min_weight;
|
||||||
|
assert(sizeof(header.md5_hex) == md5.size());
|
||||||
|
|
||||||
|
if (0 != memcmp(&header.md5_hex[0], md5.c_str(), md5.size())) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
assert(mmap_length_ == sizeof(header) + header.elements_num * sizeof(PinYinMemElem) + header.dat_size * dat_.unit_size());
|
||||||
|
pinyin_elements_ptr_ = (const PinYinMemElem *)(mmap_addr_ + sizeof(header));
|
||||||
|
char * dat_ptr = mmap_addr_ + sizeof(header) + sizeof(PinYinMemElem) * elements_num_;
|
||||||
|
dat_.set_array(dat_ptr, header.dat_size);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
void BuildDatCache(vector<DatElement>& elements, const string & dat_cache_file, const string & md5) {
|
||||||
|
std::sort(elements.begin(), elements.end());
|
||||||
|
|
||||||
|
vector<const char*> keys_ptr_vec;
|
||||||
|
vector<int> values_vec;
|
||||||
|
vector<DatMemElem> mem_elem_vec;
|
||||||
|
|
||||||
|
keys_ptr_vec.reserve(elements.size());
|
||||||
|
values_vec.reserve(elements.size());
|
||||||
|
mem_elem_vec.reserve(elements.size());
|
||||||
|
|
||||||
|
CacheFileHeader header;
|
||||||
|
header.min_weight = min_weight_;
|
||||||
|
assert(sizeof(header.md5_hex) == md5.size());
|
||||||
|
memcpy(&header.md5_hex[0], md5.c_str(), md5.size());
|
||||||
|
|
||||||
|
for (size_t i = 0; i < elements.size(); ++i) {
|
||||||
|
keys_ptr_vec.push_back(elements[i].word.data());
|
||||||
|
values_vec.push_back(i);
|
||||||
|
mem_elem_vec.push_back(DatMemElem());
|
||||||
|
auto & mem_elem = mem_elem_vec.back();
|
||||||
|
mem_elem.weight = elements[i].weight;
|
||||||
|
mem_elem.SetTag(elements[i].tag);
|
||||||
|
}
|
||||||
|
|
||||||
|
auto const ret = dat_.build(keys_ptr_vec.size(), &keys_ptr_vec[0], NULL, &values_vec[0]);
|
||||||
|
assert(0 == ret);
|
||||||
|
header.elements_num = mem_elem_vec.size();
|
||||||
|
header.dat_size = dat_.size();
|
||||||
|
|
||||||
|
{
|
||||||
|
string tmp_filepath = string(dat_cache_file) + "_XXXXXX";
|
||||||
|
::umask(S_IWGRP | S_IWOTH);
|
||||||
|
//const int fd =::mkstemp(&tmp_filepath[0]);
|
||||||
|
const int fd =::mkstemp((char *)tmp_filepath.data());
|
||||||
|
qDebug() << "mkstemp :" << errno << tmp_filepath.data();
|
||||||
|
assert(fd >= 0);
|
||||||
|
::fchmod(fd, 0644);
|
||||||
|
|
||||||
|
auto write_bytes = ::write(fd, (const char *)&header, sizeof(header));
|
||||||
|
write_bytes += ::write(fd, (const char *)&mem_elem_vec[0], sizeof(mem_elem_vec[0]) * mem_elem_vec.size());
|
||||||
|
write_bytes += ::write(fd, dat_.array(), dat_.total_size());
|
||||||
|
|
||||||
|
assert(write_bytes == sizeof(header) + mem_elem_vec.size() * sizeof(mem_elem_vec[0]) + dat_.total_size());
|
||||||
|
::close(fd);
|
||||||
|
|
||||||
|
const auto rename_ret = ::rename(tmp_filepath.c_str(), dat_cache_file.c_str());
|
||||||
|
assert(0 == rename_ret);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void BuildDatCache(vector<IdfElement>& elements, const string & dat_cache_file, const string & md5) {
|
||||||
|
std::sort(elements.begin(), elements.end());
|
||||||
|
|
||||||
|
vector<const char*> keys_ptr_vec;
|
||||||
|
vector<int> values_vec;
|
||||||
|
vector<double> mem_elem_vec;
|
||||||
|
|
||||||
|
keys_ptr_vec.reserve(elements.size());
|
||||||
|
values_vec.reserve(elements.size());
|
||||||
|
mem_elem_vec.reserve(elements.size());
|
||||||
|
|
||||||
|
CacheFileHeader header;
|
||||||
|
header.min_weight = min_weight_;
|
||||||
|
assert(sizeof(header.md5_hex) == md5.size());
|
||||||
|
memcpy(&header.md5_hex[0], md5.c_str(), md5.size());
|
||||||
|
|
||||||
|
for (size_t i = 0; i < elements.size(); ++i) {
|
||||||
|
keys_ptr_vec.push_back(elements[i].word.data());
|
||||||
|
values_vec.push_back(i);
|
||||||
|
mem_elem_vec.push_back(elements[i].idf);
|
||||||
|
}
|
||||||
|
|
||||||
|
auto const ret = dat_.build(keys_ptr_vec.size(), &keys_ptr_vec[0], NULL, &values_vec[0]);
|
||||||
|
assert(0 == ret);
|
||||||
|
header.elements_num = mem_elem_vec.size();
|
||||||
|
header.dat_size = dat_.size();
|
||||||
|
|
||||||
|
{
|
||||||
|
string tmp_filepath = string(dat_cache_file) + "_XXXXXX";
|
||||||
|
::umask(S_IWGRP | S_IWOTH);
|
||||||
|
//const int fd =::mkstemp(&tmp_filepath[0]);
|
||||||
|
const int fd =::mkstemp((char *)tmp_filepath.data());
|
||||||
|
qDebug() << "mkstemp error:" << errno << tmp_filepath.data();
|
||||||
|
assert(fd >= 0);
|
||||||
|
::fchmod(fd, 0644);
|
||||||
|
|
||||||
|
auto write_bytes = ::write(fd, (const char *)&header, sizeof(header));
|
||||||
|
write_bytes += ::write(fd, (const char *)&mem_elem_vec[0], sizeof(double) * mem_elem_vec.size());
|
||||||
|
write_bytes += ::write(fd, dat_.array(), dat_.total_size());
|
||||||
|
|
||||||
|
assert(write_bytes == sizeof(header) + mem_elem_vec.size() * sizeof(double) + dat_.total_size());
|
||||||
|
::close(fd);
|
||||||
|
|
||||||
|
const auto rename_ret = ::rename(tmp_filepath.c_str(), dat_cache_file.c_str());
|
||||||
|
assert(0 == rename_ret);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void BuildDatCache(vector<PinYinElement>& elements, const string & dat_cache_file, const string & md5) {
|
||||||
|
//std::sort(elements.begin(), elements.end());
|
||||||
|
|
||||||
|
vector<const char*> keys_ptr_vec;
|
||||||
|
vector<int> values_vec;
|
||||||
|
vector<PinYinMemElem> mem_elem_vec;
|
||||||
|
|
||||||
|
keys_ptr_vec.reserve(elements.size());
|
||||||
|
values_vec.reserve(elements.size());
|
||||||
|
mem_elem_vec.reserve(elements.size());
|
||||||
|
|
||||||
|
CacheFileHeader header;
|
||||||
|
header.min_weight = min_weight_;
|
||||||
|
assert(sizeof(header.md5_hex) == md5.size());
|
||||||
|
memcpy(&header.md5_hex[0], md5.c_str(), md5.size());
|
||||||
|
|
||||||
|
for (size_t i = 0; i < elements.size(); ++i) {
|
||||||
|
keys_ptr_vec.push_back(elements[i].word.data());
|
||||||
|
values_vec.push_back(i);
|
||||||
|
mem_elem_vec.push_back(PinYinMemElem());
|
||||||
|
auto & mem_elem = mem_elem_vec.back();
|
||||||
|
mem_elem.SetTag(elements[i].tag);
|
||||||
|
}
|
||||||
|
|
||||||
|
auto const ret = dat_.build(keys_ptr_vec.size(), &keys_ptr_vec[0], NULL, &values_vec[0]);
|
||||||
|
assert(0 == ret);
|
||||||
|
header.elements_num = mem_elem_vec.size();
|
||||||
|
header.dat_size = dat_.size();
|
||||||
|
|
||||||
|
{
|
||||||
|
string tmp_filepath = string(dat_cache_file) + "_XXXXXX";
|
||||||
|
::umask(S_IWGRP | S_IWOTH);
|
||||||
|
//const int fd =::mkstemp(&tmp_filepath[0]);
|
||||||
|
const int fd =::mkstemp((char *)tmp_filepath.data());
|
||||||
|
qDebug() << "mkstemp :" << errno << tmp_filepath.data();
|
||||||
|
assert(fd >= 0);
|
||||||
|
::fchmod(fd, 0644);
|
||||||
|
|
||||||
|
auto write_bytes = ::write(fd, (const char *)&header, sizeof(header));
|
||||||
|
write_bytes += ::write(fd, (const char *)&mem_elem_vec[0], sizeof(mem_elem_vec[0]) * mem_elem_vec.size());
|
||||||
|
write_bytes += ::write(fd, dat_.array(), dat_.total_size());
|
||||||
|
|
||||||
|
assert(write_bytes == sizeof(header) + mem_elem_vec.size() * sizeof(mem_elem_vec[0]) + dat_.total_size());
|
||||||
|
::close(fd);
|
||||||
|
|
||||||
|
const auto rename_ret = ::rename(tmp_filepath.c_str(), dat_cache_file.c_str());
|
||||||
|
assert(0 == rename_ret);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
DatTrie(const DatTrie &);
|
||||||
|
DatTrie &operator=(const DatTrie &);
|
||||||
|
|
||||||
|
private:
|
||||||
|
JiebaDAT dat_;
|
||||||
|
const DatMemElem * elements_ptr_ = nullptr;
|
||||||
|
const double * idf_elements_ptr_ = nullptr;
|
||||||
|
const PinYinMemElem * pinyin_elements_ptr_ = nullptr;
|
||||||
|
size_t elements_num_ = 0;
|
||||||
|
double min_weight_ = 0;
|
||||||
|
|
||||||
|
int mmap_fd_ = -1;
|
||||||
|
size_t mmap_length_ = 0;
|
||||||
|
char * mmap_addr_ = nullptr;
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
inline string CalcFileListMD5(const string & files_list, size_t & file_size_sum) {
|
||||||
|
limonp::MD5 md5;
|
||||||
|
|
||||||
|
const auto files = limonp::Split(files_list, "|;");
|
||||||
|
file_size_sum = 0;
|
||||||
|
|
||||||
|
for (auto const & local_path : files) {
|
||||||
|
const int fd = ::open(local_path.c_str(), O_RDONLY);
|
||||||
|
if( fd < 0){
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
auto const len = ::lseek(fd, 0, SEEK_END);
|
||||||
|
if (len > 0) {
|
||||||
|
void * addr = ::mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0);
|
||||||
|
assert(MAP_FAILED != addr);
|
||||||
|
|
||||||
|
md5.Update((unsigned char *) addr, len);
|
||||||
|
file_size_sum += len;
|
||||||
|
|
||||||
|
::munmap(addr, len);
|
||||||
|
}
|
||||||
|
::close(fd);
|
||||||
|
}
|
||||||
|
|
||||||
|
md5.Final();
|
||||||
|
return string(md5.digestChars);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
|
@ -0,0 +1,234 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <iostream>
|
||||||
|
#include <fstream>
|
||||||
|
#include <map>
|
||||||
|
#include <string>
|
||||||
|
#include <cstring>
|
||||||
|
#include <cstdlib>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <cmath>
|
||||||
|
#include <limits>
|
||||||
|
#include "limonp/StringUtil.hpp"
|
||||||
|
#include "limonp/Logging.hpp"
|
||||||
|
#include "Unicode.hpp"
|
||||||
|
#include "DatTrie.hpp"
|
||||||
|
#include <QDebug>
|
||||||
|
namespace cppjieba {
|
||||||
|
|
||||||
|
using namespace limonp;
|
||||||
|
|
||||||
|
const double MAX_DOUBLE = 3.14e+100;
|
||||||
|
const size_t DICT_COLUMN_NUM = 3;
|
||||||
|
const char* const UNKNOWN_TAG = "";
|
||||||
|
|
||||||
|
class DictTrie {
|
||||||
|
public:
|
||||||
|
enum UserWordWeightOption {
|
||||||
|
WordWeightMin,
|
||||||
|
WordWeightMedian,
|
||||||
|
WordWeightMax,
|
||||||
|
}; // enum UserWordWeightOption
|
||||||
|
|
||||||
|
DictTrie(const string& dict_path, const string& user_dict_paths = "", const string & dat_cache_path = "",
|
||||||
|
UserWordWeightOption user_word_weight_opt = WordWeightMedian) {
|
||||||
|
Init(dict_path, user_dict_paths, dat_cache_path, user_word_weight_opt);
|
||||||
|
}
|
||||||
|
|
||||||
|
~DictTrie() {}
|
||||||
|
|
||||||
|
const DatMemElem* Find(const string & word) const {
|
||||||
|
return dat_.Find(word);
|
||||||
|
}
|
||||||
|
|
||||||
|
void FindDatDag(RuneStrArray::const_iterator begin,
|
||||||
|
RuneStrArray::const_iterator end,
|
||||||
|
vector<struct DatDag>&res,
|
||||||
|
size_t max_word_len = MAX_WORD_LENGTH) const {
|
||||||
|
dat_.Find(begin, end, res, max_word_len);
|
||||||
|
}
|
||||||
|
|
||||||
|
void FindWordRange(RuneStrArray::const_iterator begin,
|
||||||
|
RuneStrArray::const_iterator end,
|
||||||
|
vector<WordRange>& words,
|
||||||
|
size_t max_word_len = MAX_WORD_LENGTH) const {
|
||||||
|
dat_.Find(begin, end, words, max_word_len);
|
||||||
|
}
|
||||||
|
|
||||||
|
bool IsUserDictSingleChineseWord(const Rune& word) const {
|
||||||
|
return IsIn(user_dict_single_chinese_word_, word);
|
||||||
|
}
|
||||||
|
|
||||||
|
double GetMinWeight() const {
|
||||||
|
return dat_.GetMinWeight();
|
||||||
|
}
|
||||||
|
|
||||||
|
size_t GetTotalDictSize() const {
|
||||||
|
return total_dict_size_;
|
||||||
|
}
|
||||||
|
|
||||||
|
void InserUserDictNode(const string& line, bool saveNodeInfo = true) {
|
||||||
|
vector<string> buf;
|
||||||
|
DatElement node_info;
|
||||||
|
Split(line, buf, " ");
|
||||||
|
|
||||||
|
if (buf.size() == 0) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
node_info.word = buf[0];
|
||||||
|
node_info.weight = user_word_default_weight_;
|
||||||
|
node_info.tag = UNKNOWN_TAG;
|
||||||
|
|
||||||
|
if (buf.size() == 2) {
|
||||||
|
node_info.tag = buf[1];
|
||||||
|
} else if (buf.size() == 3) {
|
||||||
|
if (freq_sum_ > 0.0) {
|
||||||
|
const int freq = atoi(buf[1].c_str());
|
||||||
|
node_info.weight = log(1.0 * freq / freq_sum_);
|
||||||
|
node_info.tag = buf[2];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (saveNodeInfo) {
|
||||||
|
static_node_infos_.push_back(node_info);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (Utf8CharNum(node_info.word) == 1) {
|
||||||
|
RuneArray word;
|
||||||
|
|
||||||
|
if (DecodeRunesInString(node_info.word, word)) {
|
||||||
|
user_dict_single_chinese_word_.insert(word[0]);
|
||||||
|
} else {
|
||||||
|
XLOG(ERROR) << "Decode " << node_info.word << " failed.";
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void LoadUserDict(const string& filePaths, bool saveNodeInfo = true) {
|
||||||
|
vector<string> files = limonp::Split(filePaths, "|;");
|
||||||
|
|
||||||
|
for (size_t i = 0; i < files.size(); i++) {
|
||||||
|
ifstream ifs(files[i].c_str());
|
||||||
|
XCHECK(ifs.is_open()) << "open " << files[i] << " failed";
|
||||||
|
string line;
|
||||||
|
|
||||||
|
for (; getline(ifs, line);) {
|
||||||
|
if (line.size() == 0) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
InserUserDictNode(line, saveNodeInfo);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
private:
|
||||||
|
void Init(const string& dict_path, const string& user_dict_paths, string dat_cache_path,
|
||||||
|
UserWordWeightOption user_word_weight_opt) {
|
||||||
|
const auto dict_list = dict_path + "|" + user_dict_paths;
|
||||||
|
size_t file_size_sum = 0;
|
||||||
|
const string md5 = CalcFileListMD5(dict_list, file_size_sum);
|
||||||
|
total_dict_size_ = file_size_sum;
|
||||||
|
|
||||||
|
if (dat_cache_path.empty()) {
|
||||||
|
dat_cache_path = "/tmp/" + md5 + ".dat_";//未指定词库数据文件存储位置的默认存储在tmp目录下
|
||||||
|
}
|
||||||
|
dat_cache_path += VERSION;
|
||||||
|
QString path = QString::fromStdString(dat_cache_path);
|
||||||
|
qDebug() << "#########Dict path:" << path;
|
||||||
|
if (dat_.InitAttachDat(dat_cache_path, md5)) {
|
||||||
|
LoadUserDict(user_dict_paths, false); // for load user_dict_single_chinese_word_;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
LoadDefaultDict(dict_path);
|
||||||
|
freq_sum_ = CalcFreqSum(static_node_infos_);
|
||||||
|
CalculateWeight(static_node_infos_, freq_sum_);
|
||||||
|
double min_weight = 0;
|
||||||
|
SetStaticWordWeights(user_word_weight_opt, min_weight);
|
||||||
|
dat_.SetMinWeight(min_weight);
|
||||||
|
|
||||||
|
LoadUserDict(user_dict_paths);
|
||||||
|
const auto build_ret = dat_.InitBuildDat(static_node_infos_, dat_cache_path, md5);
|
||||||
|
assert(build_ret);
|
||||||
|
vector<DatElement>().swap(static_node_infos_);
|
||||||
|
}
|
||||||
|
|
||||||
|
void LoadDefaultDict(const string& filePath) {
|
||||||
|
ifstream ifs(filePath.c_str());
|
||||||
|
XCHECK(ifs.is_open()) << "open " << filePath << " failed.";
|
||||||
|
string line;
|
||||||
|
vector<string> buf;
|
||||||
|
|
||||||
|
for (; getline(ifs, line);) {
|
||||||
|
Split(line, buf, " ");
|
||||||
|
XCHECK(buf.size() == DICT_COLUMN_NUM) << "split result illegal, line:" << line;
|
||||||
|
DatElement node_info;
|
||||||
|
node_info.word = buf[0];
|
||||||
|
node_info.weight = atof(buf[1].c_str());
|
||||||
|
node_info.tag = buf[2];
|
||||||
|
static_node_infos_.push_back(node_info);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static bool WeightCompare(const DatElement& lhs, const DatElement& rhs) {
|
||||||
|
return lhs.weight < rhs.weight;
|
||||||
|
}
|
||||||
|
|
||||||
|
void SetStaticWordWeights(UserWordWeightOption option, double & min_weight) {
|
||||||
|
XCHECK(!static_node_infos_.empty());
|
||||||
|
vector<DatElement> x = static_node_infos_;
|
||||||
|
sort(x.begin(), x.end(), WeightCompare);
|
||||||
|
if(x.empty()){
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
min_weight = x[0].weight;
|
||||||
|
const double max_weight_ = x[x.size() - 1].weight;
|
||||||
|
const double median_weight_ = x[x.size() / 2].weight;
|
||||||
|
|
||||||
|
switch (option) {
|
||||||
|
case WordWeightMin:
|
||||||
|
user_word_default_weight_ = min_weight;
|
||||||
|
break;
|
||||||
|
|
||||||
|
case WordWeightMedian:
|
||||||
|
user_word_default_weight_ = median_weight_;
|
||||||
|
break;
|
||||||
|
|
||||||
|
default:
|
||||||
|
user_word_default_weight_ = max_weight_;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
double CalcFreqSum(const vector<DatElement>& node_infos) const {
|
||||||
|
double sum = 0.0;
|
||||||
|
|
||||||
|
for (size_t i = 0; i < node_infos.size(); i++) {
|
||||||
|
sum += node_infos[i].weight;
|
||||||
|
}
|
||||||
|
|
||||||
|
return sum;
|
||||||
|
}
|
||||||
|
|
||||||
|
void CalculateWeight(vector<DatElement>& node_infos, double sum) const {
|
||||||
|
for (size_t i = 0; i < node_infos.size(); i++) {
|
||||||
|
DatElement& node_info = node_infos[i];
|
||||||
|
assert(node_info.weight > 0.0);
|
||||||
|
node_info.weight = log(double(node_info.weight) / sum);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
vector<DatElement> static_node_infos_;
|
||||||
|
size_t total_dict_size_ = 0;
|
||||||
|
DatTrie dat_;
|
||||||
|
|
||||||
|
double freq_sum_;
|
||||||
|
double user_word_default_weight_;
|
||||||
|
unordered_set<Rune> user_dict_single_chinese_word_;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
|
@ -0,0 +1,67 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <algorithm>
|
||||||
|
#include <set>
|
||||||
|
#include <cassert>
|
||||||
|
#include "limonp/Logging.hpp"
|
||||||
|
#include "segment-trie/segment-trie.h"
|
||||||
|
//#include "DictTrie.hpp"
|
||||||
|
#include "SegmentBase.hpp"
|
||||||
|
#include "Unicode.hpp"
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
class FullSegment: public SegmentBase {
|
||||||
|
public:
|
||||||
|
FullSegment(const DictTrie* dictTrie)
|
||||||
|
: dictTrie_(dictTrie) {
|
||||||
|
assert(dictTrie_);
|
||||||
|
}
|
||||||
|
~FullSegment() { }
|
||||||
|
|
||||||
|
virtual void Cut(RuneStrArray::const_iterator begin,
|
||||||
|
RuneStrArray::const_iterator end,
|
||||||
|
vector<WordRange>& res, bool, size_t) const override {
|
||||||
|
assert(dictTrie_);
|
||||||
|
vector<struct DatDag> dags;
|
||||||
|
dictTrie_->FindDatDag(begin, end, dags);
|
||||||
|
size_t max_word_end_pos = 0;
|
||||||
|
|
||||||
|
for (size_t i = 0; i < dags.size(); i++) {
|
||||||
|
for (const auto & kv : dags[i].nexts) {
|
||||||
|
const size_t nextoffset = kv.first - 1;
|
||||||
|
assert(nextoffset < dags.size());
|
||||||
|
const auto wordLen = nextoffset - i + 1;
|
||||||
|
const bool is_not_covered_single_word = ((dags[i].nexts.size() == 1) && (max_word_end_pos <= i));
|
||||||
|
const bool is_oov = (nullptr == kv.second); //Out-of-Vocabulary
|
||||||
|
|
||||||
|
if ((is_not_covered_single_word) || ((not is_oov) && (wordLen >= 2))) {
|
||||||
|
WordRange wr(begin + i, begin + nextoffset);
|
||||||
|
res.push_back(wr);
|
||||||
|
}
|
||||||
|
|
||||||
|
max_word_end_pos = max(max_word_end_pos, nextoffset + 1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
virtual void CutWithSentence(const string& s, RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<string>& res, bool hmm,
|
||||||
|
size_t) const override {
|
||||||
|
std::ignore = s;
|
||||||
|
std::ignore = begin;
|
||||||
|
std::ignore = end;
|
||||||
|
std::ignore = res;
|
||||||
|
std::ignore = hmm;
|
||||||
|
}
|
||||||
|
virtual void CutWithSentence(const string& s, RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, unordered_map<string, KeyWord>& res, bool hmm,
|
||||||
|
size_t) const override {
|
||||||
|
std::ignore = s;
|
||||||
|
std::ignore = begin;
|
||||||
|
std::ignore = end;
|
||||||
|
std::ignore = res;
|
||||||
|
std::ignore = hmm;
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
const DictTrie* dictTrie_;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
|
@ -0,0 +1,158 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include "limonp/StringUtil.hpp"
|
||||||
|
//#define USE_CEDAR_SEGMENT //使用cedar初步测试性能损失3%-5%左右,内存占用降低近1M
|
||||||
|
#ifdef USE_CEDAR_SEGMENT
|
||||||
|
#include "cedar/cedar.h"
|
||||||
|
#endif
|
||||||
|
namespace cppjieba {
|
||||||
|
|
||||||
|
using namespace limonp;
|
||||||
|
#ifdef USE_CEDAR_SEGMENT
|
||||||
|
typedef cedar::da<float, -1, -2, false> EmitProbMap;
|
||||||
|
#else
|
||||||
|
typedef unordered_map<Rune, double> EmitProbMap;
|
||||||
|
#endif
|
||||||
|
struct HMMModel {
|
||||||
|
/*
|
||||||
|
* STATUS:
|
||||||
|
* 0: HMMModel::B, 1: HMMModel::E, 2: HMMModel::M, 3:HMMModel::S
|
||||||
|
* */
|
||||||
|
enum {B = 0, E = 1, M = 2, S = 3, STATUS_SUM = 4};
|
||||||
|
|
||||||
|
HMMModel(const string& modelPath) {
|
||||||
|
memset(startProb, 0, sizeof(startProb));
|
||||||
|
memset(transProb, 0, sizeof(transProb));
|
||||||
|
statMap[0] = 'B';
|
||||||
|
statMap[1] = 'E';
|
||||||
|
statMap[2] = 'M';
|
||||||
|
statMap[3] = 'S';
|
||||||
|
emitProbVec.push_back(&emitProbB);
|
||||||
|
emitProbVec.push_back(&emitProbE);
|
||||||
|
emitProbVec.push_back(&emitProbM);
|
||||||
|
emitProbVec.push_back(&emitProbS);
|
||||||
|
LoadModel(modelPath);
|
||||||
|
}
|
||||||
|
~HMMModel() {
|
||||||
|
}
|
||||||
|
void LoadModel(const string& filePath) {
|
||||||
|
ifstream ifile(filePath.c_str());
|
||||||
|
XCHECK(ifile.is_open()) << "open " << filePath << " failed";
|
||||||
|
string line;
|
||||||
|
vector<string> tmp;
|
||||||
|
vector<string> tmp2;
|
||||||
|
//Load startProb
|
||||||
|
XCHECK(GetLine(ifile, line));
|
||||||
|
Split(line, tmp, " ");
|
||||||
|
XCHECK(tmp.size() == STATUS_SUM);
|
||||||
|
|
||||||
|
for (size_t j = 0; j < tmp.size(); j++) {
|
||||||
|
startProb[j] = atof(tmp[j].c_str());
|
||||||
|
}
|
||||||
|
|
||||||
|
//Load transProb
|
||||||
|
for (size_t i = 0; i < STATUS_SUM; i++) {
|
||||||
|
XCHECK(GetLine(ifile, line));
|
||||||
|
Split(line, tmp, " ");
|
||||||
|
XCHECK(tmp.size() == STATUS_SUM);
|
||||||
|
|
||||||
|
for (size_t j = 0; j < tmp.size(); j++) {
|
||||||
|
transProb[i][j] = atof(tmp[j].c_str());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
//Load emitProbB
|
||||||
|
XCHECK(GetLine(ifile, line));
|
||||||
|
XCHECK(LoadEmitProb(line, emitProbB));
|
||||||
|
|
||||||
|
//Load emitProbE
|
||||||
|
XCHECK(GetLine(ifile, line));
|
||||||
|
XCHECK(LoadEmitProb(line, emitProbE));
|
||||||
|
|
||||||
|
//Load emitProbM
|
||||||
|
XCHECK(GetLine(ifile, line));
|
||||||
|
XCHECK(LoadEmitProb(line, emitProbM));
|
||||||
|
|
||||||
|
//Load emitProbS
|
||||||
|
XCHECK(GetLine(ifile, line));
|
||||||
|
XCHECK(LoadEmitProb(line, emitProbS));
|
||||||
|
}
|
||||||
|
double GetEmitProb(const EmitProbMap* ptMp, Rune key,
|
||||||
|
double defVal)const {
|
||||||
|
#ifdef USE_CEDAR_SEGMENT
|
||||||
|
char str_key[8];
|
||||||
|
snprintf(str_key, sizeof(str_key), "%d", key);
|
||||||
|
float result = ptMp->exactMatchSearch<float>(str_key);
|
||||||
|
return result < 0 ? defVal : result;
|
||||||
|
#else
|
||||||
|
EmitProbMap::const_iterator cit = ptMp->find(key);
|
||||||
|
|
||||||
|
if (cit == ptMp->end()) {
|
||||||
|
return defVal;
|
||||||
|
}
|
||||||
|
|
||||||
|
return cit->second;
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
bool GetLine(ifstream& ifile, string& line) {
|
||||||
|
while (getline(ifile, line)) {
|
||||||
|
Trim(line);
|
||||||
|
|
||||||
|
if (line.empty()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (StartsWith(line, "#")) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
bool LoadEmitProb(const string& line, EmitProbMap& mp) {
|
||||||
|
if (line.empty()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<string> tmp, tmp2;
|
||||||
|
RuneArray unicode;
|
||||||
|
Split(line, tmp, ",");
|
||||||
|
|
||||||
|
for (size_t i = 0; i < tmp.size(); i++) {
|
||||||
|
Split(tmp[i], tmp2, ":");
|
||||||
|
|
||||||
|
if (2 != tmp2.size()) {
|
||||||
|
XLOG(ERROR) << "emitProb illegal.";
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!DecodeRunesInString(tmp2[0], unicode) || unicode.size() != 1) {
|
||||||
|
XLOG(ERROR) << "TransCode failed.";
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
#ifdef USE_CEDAR_SEGMENT
|
||||||
|
char str_key[8];
|
||||||
|
snprintf(str_key, sizeof(str_key), "%d", unicode[0]);
|
||||||
|
mp.update(str_key, std::strlen(str_key), atof(tmp2[1].c_str()));
|
||||||
|
#else
|
||||||
|
mp[unicode[0]] = atof(tmp2[1].c_str());
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
char statMap[STATUS_SUM];
|
||||||
|
double startProb[STATUS_SUM];
|
||||||
|
double transProb[STATUS_SUM][STATUS_SUM];
|
||||||
|
EmitProbMap emitProbB;
|
||||||
|
EmitProbMap emitProbE;
|
||||||
|
EmitProbMap emitProbM;
|
||||||
|
EmitProbMap emitProbS;
|
||||||
|
vector<EmitProbMap* > emitProbVec;
|
||||||
|
}; // struct HMMModel
|
||||||
|
|
||||||
|
} // namespace cppjieba
|
||||||
|
|
|
@ -0,0 +1,206 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <iostream>
|
||||||
|
#include <fstream>
|
||||||
|
#include <memory.h>
|
||||||
|
#include <cassert>
|
||||||
|
#include "HMMModel.hpp"
|
||||||
|
#include "SegmentBase.hpp"
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
|
||||||
|
const double MIN_DOUBLE = -3.14e+100;
|
||||||
|
|
||||||
|
class HMMSegment: public SegmentBase {
|
||||||
|
public:
|
||||||
|
HMMSegment(const HMMModel* model)
|
||||||
|
: model_(model) {
|
||||||
|
}
|
||||||
|
~HMMSegment() { }
|
||||||
|
|
||||||
|
virtual void Cut(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<WordRange>& res, bool,
|
||||||
|
size_t) const override {
|
||||||
|
RuneStrArray::const_iterator left = begin;
|
||||||
|
RuneStrArray::const_iterator right = begin;
|
||||||
|
|
||||||
|
while (right != end) {
|
||||||
|
if (right->rune < 0x80) { //asc码
|
||||||
|
if (left != right) {
|
||||||
|
InternalCut(left, right, res);
|
||||||
|
}
|
||||||
|
|
||||||
|
left = right;
|
||||||
|
|
||||||
|
do {
|
||||||
|
right = SequentialLetterRule(left, end);//非英文字符则返回left,否则返回left后非英文字母的位置
|
||||||
|
|
||||||
|
if (right != left) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
right = NumbersRule(left, end);//非数字则返回left,否则返回left后非数字的位置
|
||||||
|
|
||||||
|
if (right != left) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
right ++;
|
||||||
|
} while (false);
|
||||||
|
|
||||||
|
WordRange wr(left, right - 1);
|
||||||
|
res.push_back(wr);
|
||||||
|
left = right;
|
||||||
|
} else {
|
||||||
|
right++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (left != right) {
|
||||||
|
InternalCut(left, right, res);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
virtual void CutWithSentence(const string& s, RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<string>& res, bool hmm,
|
||||||
|
size_t) const override {
|
||||||
|
std::ignore = s;
|
||||||
|
std::ignore = begin;
|
||||||
|
std::ignore = end;
|
||||||
|
std::ignore = res;
|
||||||
|
std::ignore = hmm;
|
||||||
|
}
|
||||||
|
virtual void CutWithSentence(const string& s, RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, unordered_map<string, KeyWord>& res, bool hmm,
|
||||||
|
size_t) const override {
|
||||||
|
std::ignore = s;
|
||||||
|
std::ignore = begin;
|
||||||
|
std::ignore = end;
|
||||||
|
std::ignore = res;
|
||||||
|
std::ignore = hmm;
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
// sequential letters rule
|
||||||
|
RuneStrArray::const_iterator SequentialLetterRule(RuneStrArray::const_iterator begin,
|
||||||
|
RuneStrArray::const_iterator end) const {
|
||||||
|
Rune x = begin->rune;
|
||||||
|
|
||||||
|
if (('a' <= x && x <= 'z') || ('A' <= x && x <= 'Z')) {
|
||||||
|
begin ++;
|
||||||
|
} else {
|
||||||
|
return begin;
|
||||||
|
}
|
||||||
|
|
||||||
|
while (begin != end) {
|
||||||
|
x = begin->rune;
|
||||||
|
|
||||||
|
if (('a' <= x && x <= 'z') || ('A' <= x && x <= 'Z') || ('0' <= x && x <= '9')) {
|
||||||
|
begin ++;
|
||||||
|
} else {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return begin;
|
||||||
|
}
|
||||||
|
//
|
||||||
|
RuneStrArray::const_iterator NumbersRule(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end) const {
|
||||||
|
Rune x = begin->rune;
|
||||||
|
|
||||||
|
if ('0' <= x && x <= '9') {
|
||||||
|
begin ++;
|
||||||
|
} else {
|
||||||
|
return begin;
|
||||||
|
}
|
||||||
|
|
||||||
|
while (begin != end) {
|
||||||
|
x = begin->rune;
|
||||||
|
|
||||||
|
if (('0' <= x && x <= '9') || x == '.') {
|
||||||
|
begin++;
|
||||||
|
} else {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return begin;
|
||||||
|
}
|
||||||
|
void InternalCut(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<WordRange>& res) const {
|
||||||
|
vector<size_t> status;
|
||||||
|
Viterbi(begin, end, status);
|
||||||
|
|
||||||
|
RuneStrArray::const_iterator left = begin;
|
||||||
|
RuneStrArray::const_iterator right;
|
||||||
|
|
||||||
|
for (size_t i = 0; i < status.size(); i++) {
|
||||||
|
if (status[i] % 2) { //if (HMMModel::E == status[i] || HMMModel::S == status[i])
|
||||||
|
right = begin + i + 1;
|
||||||
|
WordRange wr(left, right - 1);
|
||||||
|
res.push_back(wr);
|
||||||
|
left = right;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void Viterbi(RuneStrArray::const_iterator begin,
|
||||||
|
RuneStrArray::const_iterator end,
|
||||||
|
vector<size_t>& status) const {
|
||||||
|
size_t Y = HMMModel::STATUS_SUM;
|
||||||
|
size_t X = end - begin;
|
||||||
|
|
||||||
|
size_t XYSize = X * Y;
|
||||||
|
size_t now, old, stat;
|
||||||
|
double tmp, endE, endS;
|
||||||
|
|
||||||
|
//vector<int> path(XYSize);
|
||||||
|
//vector<double> weight(XYSize);
|
||||||
|
int path[XYSize];
|
||||||
|
double weight[XYSize];
|
||||||
|
|
||||||
|
//start
|
||||||
|
for (size_t y = 0; y < Y; y++) {
|
||||||
|
weight[0 + y * X] = model_->startProb[y] + model_->GetEmitProb(model_->emitProbVec[y], begin->rune, MIN_DOUBLE);
|
||||||
|
path[0 + y * X] = -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
double emitProb;
|
||||||
|
|
||||||
|
for (size_t x = 1; x < X; x++) {
|
||||||
|
for (size_t y = 0; y < Y; y++) {
|
||||||
|
now = x + y * X;
|
||||||
|
weight[now] = MIN_DOUBLE;
|
||||||
|
path[now] = HMMModel::E; // warning
|
||||||
|
emitProb = model_->GetEmitProb(model_->emitProbVec[y], (begin + x)->rune, MIN_DOUBLE);
|
||||||
|
|
||||||
|
for (size_t preY = 0; preY < Y; preY++) {
|
||||||
|
old = x - 1 + preY * X;
|
||||||
|
tmp = weight[old] + model_->transProb[preY][y] + emitProb;
|
||||||
|
|
||||||
|
if (tmp > weight[now]) {
|
||||||
|
weight[now] = tmp;
|
||||||
|
path[now] = preY;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
endE = weight[X - 1 + HMMModel::E * X];
|
||||||
|
endS = weight[X - 1 + HMMModel::S * X];
|
||||||
|
stat = 0;
|
||||||
|
|
||||||
|
if (endE >= endS) {
|
||||||
|
stat = HMMModel::E;
|
||||||
|
} else {
|
||||||
|
stat = HMMModel::S;
|
||||||
|
}
|
||||||
|
|
||||||
|
status.resize(X);
|
||||||
|
|
||||||
|
for (int x = X - 1 ; x >= 0; x--) {
|
||||||
|
status[x] = stat;
|
||||||
|
stat = path[x + stat * X];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const HMMModel* model_;
|
||||||
|
}; // class HMMSegment
|
||||||
|
|
||||||
|
} // namespace cppjieba
|
||||||
|
|
|
@ -0,0 +1,117 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <iostream>
|
||||||
|
#include <fstream>
|
||||||
|
#include <map>
|
||||||
|
#include <string>
|
||||||
|
#include <cstring>
|
||||||
|
#include <cstdlib>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <cmath>
|
||||||
|
#include <limits>
|
||||||
|
#include "limonp/StringUtil.hpp"
|
||||||
|
#include "limonp/Logging.hpp"
|
||||||
|
#include "Unicode.hpp"
|
||||||
|
#include "DatTrie.hpp"
|
||||||
|
#include <QDebug>
|
||||||
|
namespace cppjieba {
|
||||||
|
|
||||||
|
using namespace limonp;
|
||||||
|
|
||||||
|
const size_t IDF_COLUMN_NUM = 2;
|
||||||
|
|
||||||
|
class IdfTrie {
|
||||||
|
public:
|
||||||
|
enum UserWordWeightOption {
|
||||||
|
WordWeightMin,
|
||||||
|
WordWeightMedian,
|
||||||
|
WordWeightMax,
|
||||||
|
}; // enum UserWordWeightOption
|
||||||
|
|
||||||
|
IdfTrie(const string& dict_path, const string & dat_cache_path = "",
|
||||||
|
UserWordWeightOption user_word_weight_opt = WordWeightMedian) {
|
||||||
|
Init(dict_path, dat_cache_path, user_word_weight_opt);
|
||||||
|
}
|
||||||
|
|
||||||
|
~IdfTrie() {}
|
||||||
|
|
||||||
|
double Find(const string & word, std::size_t length = 0, std::size_t node_pos = 0) const {
|
||||||
|
return dat_.Find(word, length, node_pos);
|
||||||
|
}
|
||||||
|
|
||||||
|
size_t GetTotalDictSize() const {
|
||||||
|
return total_dict_size_;
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
void Init(const string& dict_path, string dat_cache_path,
|
||||||
|
UserWordWeightOption user_word_weight_opt) {
|
||||||
|
size_t file_size_sum = 0;
|
||||||
|
const string md5 = CalcFileListMD5(dict_path, file_size_sum);
|
||||||
|
total_dict_size_ = file_size_sum;
|
||||||
|
|
||||||
|
if (dat_cache_path.empty()) {
|
||||||
|
dat_cache_path = "/tmp/" + md5 + ".dat_";//未指定词库数据文件存储位置的默认存储在tmp目录下
|
||||||
|
}
|
||||||
|
dat_cache_path += VERSION;
|
||||||
|
QString path = QString::fromStdString(dat_cache_path);
|
||||||
|
qDebug() << "#########Idf path:" << path;
|
||||||
|
if (dat_.InitIdfAttachDat(dat_cache_path, md5)) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
LoadDefaultIdf(dict_path);
|
||||||
|
double idf_sum_ = CalcIdfSum(static_node_infos_);
|
||||||
|
assert(static_node_infos_.size());
|
||||||
|
idfAverage_ = idf_sum_ / static_node_infos_.size();
|
||||||
|
assert(idfAverage_ > 0.0);
|
||||||
|
double min_weight = 0;
|
||||||
|
dat_.SetMinWeight(min_weight);
|
||||||
|
|
||||||
|
const auto build_ret = dat_.InitBuildDat(static_node_infos_, dat_cache_path, md5);
|
||||||
|
assert(build_ret);
|
||||||
|
vector<IdfElement>().swap(static_node_infos_);
|
||||||
|
}
|
||||||
|
|
||||||
|
void LoadDefaultIdf(const string& filePath) {
|
||||||
|
ifstream ifs(filePath.c_str());
|
||||||
|
if(not ifs.is_open()){
|
||||||
|
return ;
|
||||||
|
}
|
||||||
|
XCHECK(ifs.is_open()) << "open " << filePath << " failed.";
|
||||||
|
string line;
|
||||||
|
vector<string> buf;
|
||||||
|
size_t lineno = 0;
|
||||||
|
|
||||||
|
for (; getline(ifs, line); lineno++) {
|
||||||
|
if (line.empty()) {
|
||||||
|
XLOG(ERROR) << "lineno: " << lineno << " empty. skipped.";
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
Split(line, buf, " ");
|
||||||
|
XCHECK(buf.size() == IDF_COLUMN_NUM) << "split result illegal, line:" << line;
|
||||||
|
IdfElement node_info;
|
||||||
|
node_info.word = buf[0];
|
||||||
|
node_info.idf = atof(buf[1].c_str());
|
||||||
|
static_node_infos_.push_back(node_info);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
double CalcIdfSum(const vector<IdfElement>& node_infos) const {
|
||||||
|
double sum = 0.0;
|
||||||
|
|
||||||
|
for (size_t i = 0; i < node_infos.size(); i++) {
|
||||||
|
sum += node_infos[i].idf;
|
||||||
|
}
|
||||||
|
|
||||||
|
return sum;
|
||||||
|
}
|
||||||
|
public:
|
||||||
|
double idfAverage_;
|
||||||
|
private:
|
||||||
|
vector<IdfElement> static_node_infos_;
|
||||||
|
size_t total_dict_size_ = 0;
|
||||||
|
DatTrie dat_;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
|
@ -0,0 +1,99 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <memory>
|
||||||
|
#include "QuerySegment.hpp"
|
||||||
|
#include "KeywordExtractor.hpp"
|
||||||
|
#include "segment-trie/segment-trie.h"
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
|
||||||
|
class Jieba {
|
||||||
|
public:
|
||||||
|
Jieba(const string& dict_path,
|
||||||
|
const string& model_path,
|
||||||
|
const string& user_dict_path,
|
||||||
|
const string& idfPath = "",
|
||||||
|
const string& stopWordPath = "",
|
||||||
|
const string& dat_cache_path = "")
|
||||||
|
: dict_trie_(dict_path, user_dict_path, dat_cache_path),
|
||||||
|
model_(model_path),
|
||||||
|
mp_seg_(&dict_trie_),
|
||||||
|
hmm_seg_(&model_),
|
||||||
|
mix_seg_(&dict_trie_, &model_, stopWordPath),
|
||||||
|
full_seg_(&dict_trie_),
|
||||||
|
query_seg_(&dict_trie_, &model_, stopWordPath),
|
||||||
|
extractor(&dict_trie_, &model_, idfPath, dat_cache_path, stopWordPath){ }
|
||||||
|
~Jieba() { }
|
||||||
|
|
||||||
|
void Cut(const string& sentence, vector<string>& words, bool hmm = true) const {
|
||||||
|
mix_seg_.CutToStr(sentence, words, hmm);
|
||||||
|
}
|
||||||
|
void Cut(const string& sentence, vector<Word>& words, bool hmm = true) const {
|
||||||
|
mix_seg_.CutToWord(sentence, words, hmm);
|
||||||
|
}
|
||||||
|
void CutAll(const string& sentence, vector<string>& words) const {
|
||||||
|
full_seg_.CutToStr(sentence, words);
|
||||||
|
}
|
||||||
|
void CutAll(const string& sentence, vector<Word>& words) const {
|
||||||
|
full_seg_.CutToWord(sentence, words);
|
||||||
|
}
|
||||||
|
void CutForSearch(const string& sentence, vector<string>& words, bool hmm = true) const {
|
||||||
|
query_seg_.CutToStr(sentence, words, hmm);
|
||||||
|
}
|
||||||
|
void CutForSearch(const string& sentence, vector<Word>& words, bool hmm = true) const {
|
||||||
|
query_seg_.CutToWord(sentence, words, hmm);
|
||||||
|
}
|
||||||
|
void CutHMM(const string& sentence, vector<string>& words) const {
|
||||||
|
hmm_seg_.CutToStr(sentence, words);
|
||||||
|
}
|
||||||
|
void CutHMM(const string& sentence, vector<Word>& words) const {
|
||||||
|
hmm_seg_.CutToWord(sentence, words);
|
||||||
|
}
|
||||||
|
void CutSmall(const string& sentence, vector<string>& words, size_t max_word_len) const {
|
||||||
|
mp_seg_.CutToStr(sentence, words, false, max_word_len);
|
||||||
|
}
|
||||||
|
void CutSmall(const string& sentence, vector<Word>& words, size_t max_word_len) const {
|
||||||
|
mp_seg_.CutToWord(sentence, words, false, max_word_len);
|
||||||
|
}
|
||||||
|
|
||||||
|
void Tag(const string& sentence, vector<pair<string, string> >& words) const {
|
||||||
|
mix_seg_.Tag(sentence, words);
|
||||||
|
}
|
||||||
|
string LookupTag(const string &str) const {
|
||||||
|
return mix_seg_.LookupTag(str);
|
||||||
|
}
|
||||||
|
|
||||||
|
void ResetSeparators(const string& s) {
|
||||||
|
//TODO
|
||||||
|
mp_seg_.ResetSeparators(s);
|
||||||
|
hmm_seg_.ResetSeparators(s);
|
||||||
|
mix_seg_.ResetSeparators(s);
|
||||||
|
full_seg_.ResetSeparators(s);
|
||||||
|
query_seg_.ResetSeparators(s);
|
||||||
|
}
|
||||||
|
|
||||||
|
const DictTrie* GetDictTrie() const {
|
||||||
|
return &dict_trie_;
|
||||||
|
}
|
||||||
|
|
||||||
|
const HMMModel* GetHMMModel() const {
|
||||||
|
return &model_;
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
DictTrie dict_trie_;
|
||||||
|
HMMModel model_;
|
||||||
|
|
||||||
|
// They share the same dict trie and model
|
||||||
|
MPSegment mp_seg_;
|
||||||
|
HMMSegment hmm_seg_;
|
||||||
|
MixSegment mix_seg_;
|
||||||
|
FullSegment full_seg_;
|
||||||
|
QuerySegment query_seg_;
|
||||||
|
|
||||||
|
public:
|
||||||
|
KeywordExtractor extractor;
|
||||||
|
}; // class Jieba
|
||||||
|
|
||||||
|
} // namespace cppjieba
|
||||||
|
|
|
@ -0,0 +1,100 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <cmath>
|
||||||
|
#include "MixSegment.hpp"
|
||||||
|
//#include "IdfTrie.hpp"
|
||||||
|
#include "idf-trie/idf-trie.h"
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
|
||||||
|
using namespace limonp;
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
/*utf8*/
|
||||||
|
class KeywordExtractor {
|
||||||
|
public:
|
||||||
|
|
||||||
|
KeywordExtractor(const DictTrie* dictTrie,
|
||||||
|
const HMMModel* model,
|
||||||
|
const string& idfPath,
|
||||||
|
const string& dat_cache_path,
|
||||||
|
const string& stopWordPath)
|
||||||
|
: segment_(dictTrie, model, stopWordPath),
|
||||||
|
idf_trie_(idfPath, dat_cache_path){
|
||||||
|
}
|
||||||
|
~KeywordExtractor() {
|
||||||
|
}
|
||||||
|
|
||||||
|
void Extract(const string& sentence, vector<string>& keywords, size_t topN) const {
|
||||||
|
vector<KeyWord> topWords;
|
||||||
|
Extract(sentence, topWords, topN);
|
||||||
|
|
||||||
|
for (size_t i = 0; i < topWords.size(); i++) {
|
||||||
|
keywords.push_back(topWords[i].word);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void Extract(const string& sentence, vector<pair<string, double> >& keywords, size_t topN) const {
|
||||||
|
vector<KeyWord> topWords;
|
||||||
|
Extract(sentence, topWords, topN);
|
||||||
|
|
||||||
|
for (size_t i = 0; i < topWords.size(); i++) {
|
||||||
|
keywords.push_back(pair<string, double>(topWords[i].word, topWords[i].weight));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void Extract(const string& sentence, vector<KeyWord>& keywords, size_t topN) const {
|
||||||
|
|
||||||
|
unordered_map<string, KeyWord> wordmap;//插入字符串与Word的map,相同string统计词频叠加权重
|
||||||
|
PreFilter pre_filter(symbols_, sentence);
|
||||||
|
RuneStrArray::const_iterator null_p;
|
||||||
|
WordRange range(null_p, null_p);
|
||||||
|
bool isNull(false);
|
||||||
|
while (pre_filter.Next(range, isNull)) {
|
||||||
|
if (isNull) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
segment_.CutToStr(sentence, range, wordmap);
|
||||||
|
}
|
||||||
|
|
||||||
|
keywords.clear();
|
||||||
|
keywords.reserve(wordmap.size());
|
||||||
|
|
||||||
|
for (unordered_map<string, KeyWord>::iterator itr = wordmap.begin(); itr != wordmap.end(); ++itr) {
|
||||||
|
double idf = idf_trie_.Find(itr->first);
|
||||||
|
if (-1 != idf) {//IDF词典查找
|
||||||
|
itr->second.weight *= idf;
|
||||||
|
} else {
|
||||||
|
itr->second.weight *= idf_trie_.GetIdfAverage();
|
||||||
|
}
|
||||||
|
|
||||||
|
itr->second.word = itr->first;
|
||||||
|
keywords.push_back(itr->second);
|
||||||
|
}
|
||||||
|
|
||||||
|
topN = min(topN, keywords.size());
|
||||||
|
partial_sort(keywords.begin(), keywords.begin() + topN, keywords.end(), Compare);
|
||||||
|
keywords.resize(topN);
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
|
||||||
|
static bool Compare(const KeyWord& lhs, const KeyWord& rhs) {
|
||||||
|
return lhs.weight > rhs.weight;
|
||||||
|
}
|
||||||
|
|
||||||
|
MixSegment segment_;
|
||||||
|
IdfTrie idf_trie_;
|
||||||
|
|
||||||
|
|
||||||
|
unordered_set<Rune> symbols_;
|
||||||
|
}; // class KeywordExtractor
|
||||||
|
|
||||||
|
inline ostream& operator << (ostream& os, const KeyWord& word) {
|
||||||
|
return os << "{\"word\": \"" << word.word << "\", \"offset\": " << word.offsets << ", \"weight\": " << word.weight <<
|
||||||
|
"}";
|
||||||
|
}
|
||||||
|
|
||||||
|
} // namespace cppjieba
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,133 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <algorithm>
|
||||||
|
#include <set>
|
||||||
|
#include <cassert>
|
||||||
|
#include "limonp/Logging.hpp"
|
||||||
|
#include "segment-trie/segment-trie.h"
|
||||||
|
//#include "DictTrie.hpp"
|
||||||
|
#include "SegmentTagged.hpp"
|
||||||
|
#include "PosTagger.hpp"
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
|
||||||
|
class MPSegment: public SegmentTagged {
|
||||||
|
public:
|
||||||
|
MPSegment(const DictTrie* dictTrie)
|
||||||
|
: dictTrie_(dictTrie) {
|
||||||
|
assert(dictTrie_);
|
||||||
|
}
|
||||||
|
~MPSegment() { }
|
||||||
|
|
||||||
|
virtual void Cut(RuneStrArray::const_iterator begin,
|
||||||
|
RuneStrArray::const_iterator end,
|
||||||
|
vector<WordRange>& words,
|
||||||
|
bool, size_t max_word_len) const override {
|
||||||
|
dictTrie_->FindWordRange(begin, end, words, max_word_len);
|
||||||
|
}
|
||||||
|
|
||||||
|
virtual void CutWithSentence(const string& s, RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<string>& res, bool hmm,
|
||||||
|
size_t) const override {
|
||||||
|
std::ignore = s;
|
||||||
|
std::ignore = begin;
|
||||||
|
std::ignore = end;
|
||||||
|
std::ignore = res;
|
||||||
|
std::ignore = hmm;
|
||||||
|
}
|
||||||
|
virtual void CutWithSentence(const string& s, RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, unordered_map<string, KeyWord>& res, bool hmm,
|
||||||
|
size_t) const override {
|
||||||
|
std::ignore = s;
|
||||||
|
std::ignore = begin;
|
||||||
|
std::ignore = end;
|
||||||
|
std::ignore = res;
|
||||||
|
std::ignore = hmm;
|
||||||
|
}
|
||||||
|
const DictTrie* GetDictTrie() const override {
|
||||||
|
return dictTrie_;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool Tag(const string& src, vector<pair<string, string> >& res) const override {
|
||||||
|
return tagger_.Tag(src, res, *this);
|
||||||
|
}
|
||||||
|
|
||||||
|
bool IsUserDictSingleChineseWord(const Rune& value) const {
|
||||||
|
return dictTrie_->IsUserDictSingleChineseWord(value);
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
/*
|
||||||
|
void CalcDP(vector<DatDag>& dags) const {
|
||||||
|
double val(0);
|
||||||
|
for (auto rit = dags.rbegin(); rit != dags.rend(); rit++) {
|
||||||
|
rit->max_next = -1;
|
||||||
|
rit->max_weight = MIN_DOUBLE;
|
||||||
|
|
||||||
|
for (const auto & it : rit->nexts) {
|
||||||
|
const auto nextPos = it.first;
|
||||||
|
val = dictTrie_->GetMinWeight();
|
||||||
|
|
||||||
|
if (nullptr != it.second) {
|
||||||
|
val = it.second->weight;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (nextPos < dags.size()) {
|
||||||
|
val += dags[nextPos].max_weight;
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((nextPos <= dags.size()) && (val > rit->max_weight)) {
|
||||||
|
rit->max_weight = val;
|
||||||
|
rit->max_next = nextPos;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
*/
|
||||||
|
/* 倒叙方式重写CalcDP函数,初步测试未发现问题*/
|
||||||
|
/*
|
||||||
|
void CalcDP(vector<DatDag>& dags) const {
|
||||||
|
double val(0);
|
||||||
|
size_t size = dags.size();
|
||||||
|
|
||||||
|
for (size_t i = 0; i < size; i++) {
|
||||||
|
dags[size - 1 - i].max_next = -1;
|
||||||
|
dags[size - 1 - i].max_weight = MIN_DOUBLE;
|
||||||
|
|
||||||
|
for (const auto & it : dags[size - 1 - i].nexts) {
|
||||||
|
const auto nextPos = it.first;
|
||||||
|
if (nullptr != it.second) {
|
||||||
|
val = it.second->weight;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (nextPos < dags.size()) {
|
||||||
|
val += dags[nextPos].max_weight;
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((nextPos <= dags.size()) && (val > dags[size - 1 - i].max_weight)) {
|
||||||
|
dags[size - 1 - i].max_weight = val;
|
||||||
|
dags[size - 1 - i].max_next = nextPos;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void CutByDag(RuneStrArray::const_iterator begin,
|
||||||
|
RuneStrArray::const_iterator,
|
||||||
|
const vector<DatDag>& dags,
|
||||||
|
vector<WordRange>& words) const {
|
||||||
|
|
||||||
|
for (size_t i = 0; i < dags.size();) {
|
||||||
|
const auto next = dags[i].max_next;
|
||||||
|
assert(next > i);
|
||||||
|
assert(next <= dags.size());
|
||||||
|
WordRange wr(begin + i, begin + next - 1);
|
||||||
|
words.push_back(wr);
|
||||||
|
i = next;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
*///相关功能已集成到Find函数中
|
||||||
|
const DictTrie* dictTrie_;
|
||||||
|
PosTagger tagger_;
|
||||||
|
|
||||||
|
}; // class MPSegment
|
||||||
|
|
||||||
|
} // namespace cppjieba
|
||||||
|
|
|
@ -0,0 +1,276 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <cassert>
|
||||||
|
#include "MPSegment.hpp"
|
||||||
|
#include "HMMSegment.hpp"
|
||||||
|
#include "limonp/StringUtil.hpp"
|
||||||
|
#include "PosTagger.hpp"
|
||||||
|
#define STOP_WORDS_USE_CEDAR_SEGMENT //使用cedar初步测试性能提升3%-5%左右,内存占用降低近不明显
|
||||||
|
#ifdef STOP_WORDS_USE_CEDAR_SEGMENT
|
||||||
|
#include "cedar/cedar.h"
|
||||||
|
#endif
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
class MixSegment: public SegmentTagged {
|
||||||
|
public:
|
||||||
|
MixSegment(const DictTrie* dictTrie,
|
||||||
|
const HMMModel* model,
|
||||||
|
const string& stopWordPath)
|
||||||
|
: mpSeg_(dictTrie), hmmSeg_(model) {
|
||||||
|
LoadStopWordDict(stopWordPath);
|
||||||
|
}
|
||||||
|
~MixSegment() {}
|
||||||
|
|
||||||
|
virtual void Cut(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<WordRange>& res, bool hmm,
|
||||||
|
size_t) const override {
|
||||||
|
if (!hmm) {
|
||||||
|
mpSeg_.CutRuneArray(begin, end, res);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
vector<WordRange> words;
|
||||||
|
assert(end >= begin);
|
||||||
|
words.reserve(end - begin);
|
||||||
|
mpSeg_.CutRuneArray(begin, end, words);
|
||||||
|
|
||||||
|
vector<WordRange> hmmRes;
|
||||||
|
hmmRes.reserve(end - begin);
|
||||||
|
|
||||||
|
for (size_t i = 0; i < words.size(); i++) {
|
||||||
|
//if mp Get a word, it's ok, put it into result
|
||||||
|
if (words[i].left != words[i].right || (words[i].left == words[i].right &&
|
||||||
|
mpSeg_.IsUserDictSingleChineseWord(words[i].left->rune))) {
|
||||||
|
res.push_back(words[i]);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// if mp Get a single one and it is not in userdict, collect it in sequence
|
||||||
|
size_t j = i;
|
||||||
|
|
||||||
|
while (j < words.size() && words[j].left == words[j].right &&
|
||||||
|
!mpSeg_.IsUserDictSingleChineseWord(words[j].left->rune)) {
|
||||||
|
j++;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cut the sequence with hmm
|
||||||
|
assert(j - 1 >= i);
|
||||||
|
// TODO
|
||||||
|
hmmSeg_.CutRuneArray(words[i].left, words[j - 1].left + 1, hmmRes);
|
||||||
|
|
||||||
|
//put hmm result to result
|
||||||
|
for (size_t k = 0; k < hmmRes.size(); k++) {
|
||||||
|
res.push_back(hmmRes[k]);
|
||||||
|
}
|
||||||
|
|
||||||
|
//clear tmp vars
|
||||||
|
hmmRes.clear();
|
||||||
|
|
||||||
|
//let i jump over this piece
|
||||||
|
i = j - 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
virtual void CutWithSentence(const string& s, RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<string>& res, bool hmm,
|
||||||
|
size_t) const override {
|
||||||
|
//目前hmm默认开启,后期如有需要关闭再修改--jxx20210519
|
||||||
|
// if (!hmm) {
|
||||||
|
// mpSeg_.CutRuneArray(begin, end, res);
|
||||||
|
// return;
|
||||||
|
// }
|
||||||
|
std::ignore = hmm;
|
||||||
|
vector<WordRange> words;
|
||||||
|
assert(end >= begin);
|
||||||
|
words.reserve(end - begin);
|
||||||
|
mpSeg_.CutRuneArray(begin, end, words);
|
||||||
|
|
||||||
|
vector<WordRange> hmmRes;
|
||||||
|
hmmRes.reserve(end - begin);
|
||||||
|
|
||||||
|
for (size_t i = 0; i < words.size(); i++) {
|
||||||
|
//if mp Get a word, it's ok, put it into result
|
||||||
|
if (words[i].left != words[i].right) {
|
||||||
|
res.push_back(GetStringFromRunes(s, words[i].left, words[i].right));
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if (mpSeg_.IsUserDictSingleChineseWord(words[i].left->rune)
|
||||||
|
|| i == (words.size() - 1)) {//i++后如果是最后一个字符则直接push_back
|
||||||
|
res.push_back(GetStringFromRunes(s, words[i].left, words[i].right));
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// if mp Get a single one and it is not in userdict, collect it in sequence
|
||||||
|
size_t j = i + 1; //当前i字符为单独的字符并且不在用户字典里(i字符不是最后一个字符),直接判定j字符
|
||||||
|
|
||||||
|
while (j < (words.size() - 1) && words[j].left == words[j].right &&
|
||||||
|
!mpSeg_.IsUserDictSingleChineseWord(words[j].left->rune)) {
|
||||||
|
j++;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cut the sequence with hmm
|
||||||
|
assert(j - 1 >= i);
|
||||||
|
// TODO
|
||||||
|
hmmSeg_.CutRuneArray(words[i].left, words[j - 1].left + 1, hmmRes);
|
||||||
|
|
||||||
|
//put hmm result to result
|
||||||
|
for (size_t k = 0; k < hmmRes.size(); k++) {
|
||||||
|
res.push_back(GetStringFromRunes(s, hmmRes[k].left, hmmRes[k].right));
|
||||||
|
}
|
||||||
|
|
||||||
|
//clear tmp vars
|
||||||
|
hmmRes.clear();
|
||||||
|
|
||||||
|
//let i jump over this piece
|
||||||
|
i = j - 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
virtual void CutWithSentence(const string& s, RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, unordered_map<string, KeyWord>& res, bool hmm,
|
||||||
|
size_t) const override {
|
||||||
|
std::ignore = hmm;
|
||||||
|
vector<WordRange> words;
|
||||||
|
vector<WordRange> hmmRes;
|
||||||
|
assert(end >= begin);
|
||||||
|
if (3 == begin->len or 4 == begin->len) {
|
||||||
|
words.reserve(end - begin);
|
||||||
|
mpSeg_.CutRuneArray(begin, end, words);
|
||||||
|
hmmRes.reserve(words.size());
|
||||||
|
} else {
|
||||||
|
hmmRes.reserve(end - begin);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (words.size() != 0) {//存在中文分词结果
|
||||||
|
for (size_t i = 0; i < words.size(); i++) {
|
||||||
|
|
||||||
|
string str = GetStringFromRunes(s, words[i].left, words[i].right);
|
||||||
|
|
||||||
|
if (words[i].left != words[i].right) {
|
||||||
|
#ifdef STOP_WORDS_USE_CEDAR_SEGMENT
|
||||||
|
if (0 < stopWords_.exactMatchSearch<int>(str.c_str(), str.size())) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
if (stopWords_.find(str) != stopWords_.end()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
res[str].offsets.push_back(words[i].left->offset);
|
||||||
|
res[str].weight += 1.0;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (mpSeg_.IsUserDictSingleChineseWord(words[i].left->rune)
|
||||||
|
|| i == (words.size() - 1)) {//i++后如果是最后一个字符则直接push_back
|
||||||
|
#ifdef STOP_WORDS_USE_CEDAR_SEGMENT
|
||||||
|
if (0 < stopWords_.exactMatchSearch<int>(str.c_str(), str.size())) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
if (stopWords_.find(str) != stopWords_.end()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
res[str].offsets.push_back(words[i].left->offset);
|
||||||
|
res[str].weight += 1.0;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
// if mp Get a single one and it is not in userdict, collect it in sequence
|
||||||
|
size_t j = i + 1; //当前i字符为单独的字符并且不在用户字典里(i字符不是最后一个字符),直接判定j字符
|
||||||
|
bool isLastWordsSingle(false);
|
||||||
|
while (j <= (words.size() - 1)
|
||||||
|
&& words[j].left == words[j].right
|
||||||
|
&& !mpSeg_.IsUserDictSingleChineseWord(words[j].left->rune)) {
|
||||||
|
if (j == (words.size() - 1)) {//最后一个分词结果是单字
|
||||||
|
isLastWordsSingle = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
j++;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cut the sequence with hmm
|
||||||
|
assert(j - 1 >= i);
|
||||||
|
// TODO
|
||||||
|
if (isLastWordsSingle) {
|
||||||
|
hmmSeg_.CutRuneArray(words[i].left, words[j].left + 1, hmmRes);
|
||||||
|
} else {
|
||||||
|
hmmSeg_.CutRuneArray(words[i].left, words[j].left, hmmRes);
|
||||||
|
}
|
||||||
|
|
||||||
|
//put hmm result to result
|
||||||
|
for (size_t k = 0; k < hmmRes.size(); k++) {
|
||||||
|
string hmmStr = GetStringFromRunes(s, hmmRes[k].left, hmmRes[k].right);
|
||||||
|
#ifdef STOP_WORDS_USE_CEDAR_SEGMENT
|
||||||
|
if (0 < stopWords_.exactMatchSearch<int>(hmmStr.c_str(), hmmStr.size())) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
if (/*IsSingleWord(hmmStr) || */stopWords_.find(hmmStr) != stopWords_.end()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
res[hmmStr].offsets.push_back(hmmRes[k].left->offset);
|
||||||
|
res[hmmStr].weight += 1.0;
|
||||||
|
}
|
||||||
|
|
||||||
|
//clear tmp vars
|
||||||
|
hmmRes.clear();
|
||||||
|
|
||||||
|
//let i jump over this piece
|
||||||
|
if (isLastWordsSingle) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
i = j - 1;
|
||||||
|
}
|
||||||
|
} else {//不存在中文分词结果
|
||||||
|
for (size_t i = 0; i < (size_t)(end - begin); i++) {
|
||||||
|
string str = s.substr((begin+i)->offset, (begin+i)->len);
|
||||||
|
res[str].offsets.push_back((begin+i)->offset);
|
||||||
|
res[str].weight += 1.0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const DictTrie* GetDictTrie() const override {
|
||||||
|
return mpSeg_.GetDictTrie();
|
||||||
|
}
|
||||||
|
|
||||||
|
bool Tag(const string& src, vector<pair<string, string> >& res) const override {
|
||||||
|
return tagger_.Tag(src, res, *this);
|
||||||
|
}
|
||||||
|
|
||||||
|
string LookupTag(const string &str) const {
|
||||||
|
return tagger_.LookupTag(str, *this);
|
||||||
|
}
|
||||||
|
|
||||||
|
void LoadStopWordDict(const string& filePath) {
|
||||||
|
ifstream ifs(filePath.c_str());
|
||||||
|
if(not ifs.is_open()){
|
||||||
|
return ;
|
||||||
|
}
|
||||||
|
XCHECK(ifs.is_open()) << "open " << filePath << " failed";
|
||||||
|
string line ;
|
||||||
|
|
||||||
|
while (getline(ifs, line)) {
|
||||||
|
#ifdef STOP_WORDS_USE_CEDAR_SEGMENT
|
||||||
|
stopWords_.update(line.c_str(), line.size(), 1);
|
||||||
|
#else
|
||||||
|
stopWords_.insert(line);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
assert(stopWords_.size());
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
#ifdef STOP_WORDS_USE_CEDAR_SEGMENT
|
||||||
|
cedar::da<int, -1, -2, false> stopWords_;
|
||||||
|
#else
|
||||||
|
unordered_set<string> stopWords_;
|
||||||
|
#endif
|
||||||
|
MPSegment mpSeg_;
|
||||||
|
HMMSegment hmmSeg_;
|
||||||
|
PosTagger tagger_;
|
||||||
|
|
||||||
|
}; // class MixSegment
|
||||||
|
|
||||||
|
} // namespace cppjieba
|
||||||
|
|
|
@ -0,0 +1,154 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <iostream>
|
||||||
|
#include <fstream>
|
||||||
|
#include <map>
|
||||||
|
#include <string>
|
||||||
|
#include <cstring>
|
||||||
|
#include <cstdlib>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <cmath>
|
||||||
|
#include <limits>
|
||||||
|
#include "limonp/StringUtil.hpp"
|
||||||
|
#include "limonp/Logging.hpp"
|
||||||
|
#include "Unicode.hpp"
|
||||||
|
#include "DatTrie.hpp"
|
||||||
|
#include <QDebug>
|
||||||
|
namespace cppjieba {
|
||||||
|
|
||||||
|
using namespace limonp;
|
||||||
|
|
||||||
|
const size_t PINYIN_COLUMN_NUM = 2;
|
||||||
|
|
||||||
|
class PinYinTrie {
|
||||||
|
public:
|
||||||
|
enum UserWordWeightOption {
|
||||||
|
WordWeightMin,
|
||||||
|
WordWeightMedian,
|
||||||
|
WordWeightMax,
|
||||||
|
}; // enum UserWordWeightOption
|
||||||
|
|
||||||
|
PinYinTrie(const string& dict_path, const string & dat_cache_path = "",
|
||||||
|
UserWordWeightOption user_word_weight_opt = WordWeightMedian) {
|
||||||
|
Init(dict_path, dat_cache_path, user_word_weight_opt);
|
||||||
|
}
|
||||||
|
|
||||||
|
~PinYinTrie() {}
|
||||||
|
|
||||||
|
int getMultiTonResults(string word, QStringList &results) {
|
||||||
|
if (qmap_chinese2pinyin.contains(QString::fromStdString(word))) {
|
||||||
|
for (auto i:qmap_chinese2pinyin[QString::fromStdString(word)])
|
||||||
|
results.push_back(i);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
int getSingleTonResult(string word, QString &result) {
|
||||||
|
const PinYinMemElem * tmp = dat_.PinYinFind(word);
|
||||||
|
if (tmp) {
|
||||||
|
result = QString::fromStdString(tmp->GetTag());
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool contains(string &word) {
|
||||||
|
if (qmap_chinese2pinyin.contains(QString::fromStdString(word))
|
||||||
|
or !dat_.PinYinFind(word))
|
||||||
|
return true;
|
||||||
|
// if (map_chinese2pinyin.contains(word)
|
||||||
|
// or !dat_.PinYinFind(word))
|
||||||
|
// return true;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool isMultiTone(const string &word) {
|
||||||
|
if (qmap_chinese2pinyin.contains(QString::fromStdString(word)))
|
||||||
|
return true;
|
||||||
|
// if (map_chinese2pinyin.contains(word))
|
||||||
|
// return true;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
size_t GetTotalDictSize() const {
|
||||||
|
return total_dict_size_;
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
void Init(const string& dict_path, string dat_cache_path,
|
||||||
|
UserWordWeightOption user_word_weight_opt) {
|
||||||
|
size_t file_size_sum = 0;
|
||||||
|
vector<PinYinElement> node_infos;
|
||||||
|
const string md5 = CalcFileListMD5(dict_path, file_size_sum);
|
||||||
|
total_dict_size_ = file_size_sum;
|
||||||
|
|
||||||
|
if (dat_cache_path.empty()) {
|
||||||
|
//未指定词库数据文件存储位置的默认存储在tmp目录下--jxx20200519
|
||||||
|
dat_cache_path = /*dict_path*/"/tmp/" + md5 + "." + to_string(user_word_weight_opt) + ".dat_cache";
|
||||||
|
}
|
||||||
|
QString path = QString::fromStdString(dat_cache_path);
|
||||||
|
qDebug() << "#########PinYin path:" << path << file_size_sum;
|
||||||
|
if (dat_.InitPinYinAttachDat(dat_cache_path, md5)) {
|
||||||
|
//多音字仍需遍历文件信息
|
||||||
|
LoadDefaultPinYin(node_infos, dict_path, true);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
LoadDefaultPinYin(node_infos, dict_path, false);
|
||||||
|
double min_weight = 0;
|
||||||
|
dat_.SetMinWeight(min_weight);
|
||||||
|
|
||||||
|
const auto build_ret = dat_.InitBuildDat(node_infos, dat_cache_path, md5);
|
||||||
|
assert(build_ret);
|
||||||
|
vector<PinYinElement>().swap(node_infos);
|
||||||
|
}
|
||||||
|
|
||||||
|
void LoadDefaultPinYin(vector<PinYinElement> &node_infos, const string& filePath, bool multiFlag) {
|
||||||
|
ifstream ifs(filePath.c_str());
|
||||||
|
if(not ifs.is_open()){
|
||||||
|
return ;
|
||||||
|
}
|
||||||
|
XCHECK(ifs.is_open()) << "open " << filePath << " failed.";
|
||||||
|
string line;
|
||||||
|
vector<string> buf;
|
||||||
|
size_t lineno = 0;
|
||||||
|
|
||||||
|
for (; getline(ifs, line); lineno++) {
|
||||||
|
if (line.empty()) {
|
||||||
|
XLOG(ERROR) << "lineno: " << lineno << " empty. skipped.";
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
Split(line, buf, " ");
|
||||||
|
if (buf.size() == PINYIN_COLUMN_NUM) {
|
||||||
|
if (multiFlag) {//非多音字
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
PinYinElement node_info;
|
||||||
|
node_info.word = buf[1];
|
||||||
|
node_info.tag = buf[0];
|
||||||
|
node_infos.push_back(node_info);
|
||||||
|
} else {//多音字
|
||||||
|
QString content = QString::fromUtf8(line.c_str());
|
||||||
|
qmap_chinese2pinyin[content.split(" ").last().trimmed()] = content.split(" ");
|
||||||
|
qmap_chinese2pinyin[content.split(" ").last().trimmed()].pop_back();
|
||||||
|
/*
|
||||||
|
//std map string list
|
||||||
|
list<string> tmpList;
|
||||||
|
for(int i = 0; i < buf.size() - 1; ++i){
|
||||||
|
tmpList.push_back(buf[i]);
|
||||||
|
}
|
||||||
|
map[buf[buf.size() - 1]] = tmpList;
|
||||||
|
*/
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
QMap<QString, QStringList> qmap_chinese2pinyin;
|
||||||
|
//map<string, list<string>> map_chinese2pinyin;
|
||||||
|
size_t total_dict_size_ = 0;
|
||||||
|
DatTrie dat_;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
|
@ -0,0 +1,84 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include "limonp/StringUtil.hpp"
|
||||||
|
#include "segment-trie/segment-trie.h"
|
||||||
|
//#include "DictTrie.hpp"
|
||||||
|
//#include "SegmentTagged.hpp"
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
using namespace limonp;
|
||||||
|
|
||||||
|
static const char* const POS_M = "m";
|
||||||
|
static const char* const POS_ENG = "eng";
|
||||||
|
static const char* const POS_X = "x";
|
||||||
|
|
||||||
|
class PosTagger {
|
||||||
|
public:
|
||||||
|
PosTagger() {
|
||||||
|
}
|
||||||
|
~PosTagger() {
|
||||||
|
}
|
||||||
|
|
||||||
|
bool Tag(const string& src, vector<pair<string, string> >& res, const SegmentTagged& segment) const {
|
||||||
|
vector<string> CutRes;
|
||||||
|
segment.CutToStr(src, CutRes);
|
||||||
|
|
||||||
|
for (vector<string>::iterator itr = CutRes.begin(); itr != CutRes.end(); ++itr) {
|
||||||
|
res.push_back(make_pair(*itr, LookupTag(*itr, segment)));
|
||||||
|
}
|
||||||
|
|
||||||
|
return !res.empty();
|
||||||
|
}
|
||||||
|
|
||||||
|
string LookupTag(const string &str, const SegmentTagged& segment) const {
|
||||||
|
const DictTrie * dict = segment.GetDictTrie();
|
||||||
|
assert(dict != nullptr);
|
||||||
|
const auto tmp = dict->Find(str);
|
||||||
|
|
||||||
|
if (tmp == nullptr || tmp->GetTag().empty()) {
|
||||||
|
RuneStrArray runes;
|
||||||
|
|
||||||
|
if (!DecodeRunesInString(str, runes)) {
|
||||||
|
XLOG(ERROR) << "Decode failed.";
|
||||||
|
return POS_X;
|
||||||
|
}
|
||||||
|
|
||||||
|
return SpecialRule(runes);
|
||||||
|
} else {
|
||||||
|
return tmp->GetTag();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
const char* SpecialRule(const RuneStrArray& unicode) const {
|
||||||
|
size_t m = 0;
|
||||||
|
size_t eng = 0;
|
||||||
|
|
||||||
|
for (size_t i = 0; i < unicode.size() && eng < unicode.size() / 2; i++) {
|
||||||
|
if (unicode[i].rune < 0x80) {
|
||||||
|
eng ++;
|
||||||
|
|
||||||
|
if ('0' <= unicode[i].rune && unicode[i].rune <= '9') {
|
||||||
|
m++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ascii char is not found
|
||||||
|
if (eng == 0) {
|
||||||
|
return POS_X;
|
||||||
|
}
|
||||||
|
|
||||||
|
// all the ascii is number char
|
||||||
|
if (m == eng) {
|
||||||
|
return POS_M;
|
||||||
|
}
|
||||||
|
|
||||||
|
// the ascii chars contain english letter
|
||||||
|
return POS_ENG;
|
||||||
|
}
|
||||||
|
|
||||||
|
}; // class PosTagger
|
||||||
|
|
||||||
|
} // namespace cppjieba
|
||||||
|
|
|
@ -0,0 +1,127 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include "limonp/Logging.hpp"
|
||||||
|
#include <unordered_set>
|
||||||
|
#include "Unicode.hpp"
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
|
||||||
|
class PreFilter {
|
||||||
|
public:
|
||||||
|
PreFilter(const std::unordered_set<Rune>& symbols,
|
||||||
|
const string& sentence)
|
||||||
|
: symbols_(symbols) {
|
||||||
|
if (!DecodeRunesInString(sentence, sentence_)) {
|
||||||
|
XLOG(ERROR) << "decode failed. "<<sentence;
|
||||||
|
}
|
||||||
|
|
||||||
|
cursor_ = sentence_.begin();
|
||||||
|
}
|
||||||
|
~PreFilter() {
|
||||||
|
}
|
||||||
|
bool HasNext() const {
|
||||||
|
return cursor_ != sentence_.end();
|
||||||
|
}
|
||||||
|
bool Next(WordRange& wordRange) {
|
||||||
|
|
||||||
|
if (cursor_ == sentence_.end()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
wordRange.left = cursor_;
|
||||||
|
|
||||||
|
while (cursor_->rune == 0x20 && cursor_ != sentence_.end()) {
|
||||||
|
cursor_++;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (cursor_ == sentence_.end()) {
|
||||||
|
wordRange.right = cursor_;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
while (++cursor_ != sentence_.end()) {
|
||||||
|
if (cursor_->rune == 0x20) {
|
||||||
|
wordRange.right = cursor_;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
wordRange.right = sentence_.end();
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool Next(WordRange& wordRange, bool& isNull) {
|
||||||
|
isNull = false;
|
||||||
|
if (cursor_ == sentence_.end()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
wordRange.left = cursor_;
|
||||||
|
if (cursor_->rune == 0x20) {
|
||||||
|
while (cursor_ != sentence_.end()) {
|
||||||
|
if (cursor_->rune != 0x20) {
|
||||||
|
if (wordRange.left == cursor_) {
|
||||||
|
cursor_ ++;
|
||||||
|
}
|
||||||
|
wordRange.right = cursor_;
|
||||||
|
isNull = true;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
cursor_ ++;
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
int max_num = 0;
|
||||||
|
uint32_t utf8_num = cursor_->len;
|
||||||
|
|
||||||
|
while (cursor_ != sentence_.end()) {
|
||||||
|
if (cursor_->rune == 0x20) {
|
||||||
|
if (wordRange.left == cursor_) {
|
||||||
|
cursor_ ++;
|
||||||
|
}
|
||||||
|
|
||||||
|
wordRange.right = cursor_;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
cursor_ ++;
|
||||||
|
max_num++;
|
||||||
|
if (max_num >= 1024 or cursor_->len != utf8_num) { //todo 防止一次性传入过多字节,暂定限制为1024个字
|
||||||
|
wordRange.right = cursor_;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
wordRange.right = sentence_.end();
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
WordRange Next() {
|
||||||
|
WordRange range(cursor_, cursor_);
|
||||||
|
|
||||||
|
while (cursor_ != sentence_.end()) {
|
||||||
|
//if (IsIn(symbols_, cursor_->rune)) {
|
||||||
|
if (cursor_->rune == 0x20) {
|
||||||
|
if (range.left == cursor_) {
|
||||||
|
cursor_ ++;
|
||||||
|
}
|
||||||
|
|
||||||
|
range.right = cursor_;
|
||||||
|
return range;
|
||||||
|
}
|
||||||
|
|
||||||
|
cursor_ ++;
|
||||||
|
}
|
||||||
|
|
||||||
|
range.right = sentence_.end();
|
||||||
|
return range;
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
RuneStrArray::const_iterator cursor_;
|
||||||
|
RuneStrArray sentence_;
|
||||||
|
const std::unordered_set<Rune>& symbols_;
|
||||||
|
}; // class PreFilter
|
||||||
|
|
||||||
|
} // namespace cppjieba
|
||||||
|
|
|
@ -0,0 +1,89 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <algorithm>
|
||||||
|
#include <set>
|
||||||
|
#include <cassert>
|
||||||
|
#include "limonp/Logging.hpp"
|
||||||
|
#include "SegmentBase.hpp"
|
||||||
|
#include "FullSegment.hpp"
|
||||||
|
#include "MixSegment.hpp"
|
||||||
|
#include "Unicode.hpp"
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
class QuerySegment: public SegmentBase {
|
||||||
|
public:
|
||||||
|
QuerySegment(const DictTrie* dictTrie,
|
||||||
|
const HMMModel* model,
|
||||||
|
const string& stopWordPath)
|
||||||
|
: mixSeg_(dictTrie, model, stopWordPath), trie_(dictTrie) {
|
||||||
|
}
|
||||||
|
~QuerySegment() {
|
||||||
|
}
|
||||||
|
|
||||||
|
virtual void Cut(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<WordRange>& res, bool hmm,
|
||||||
|
size_t) const override {
|
||||||
|
//use mix Cut first
|
||||||
|
vector<WordRange> mixRes;
|
||||||
|
mixSeg_.CutRuneArray(begin, end, mixRes, hmm);
|
||||||
|
|
||||||
|
vector<WordRange> fullRes;
|
||||||
|
|
||||||
|
for (vector<WordRange>::const_iterator mixResItr = mixRes.begin(); mixResItr != mixRes.end(); mixResItr++) {
|
||||||
|
if (mixResItr->Length() > 2) {
|
||||||
|
for (size_t i = 0; i + 1 < mixResItr->Length(); i++) {
|
||||||
|
string text = EncodeRunesToString(mixResItr->left + i, mixResItr->left + i + 2);
|
||||||
|
|
||||||
|
if (trie_->Find(text) != nullptr) {
|
||||||
|
WordRange wr(mixResItr->left + i, mixResItr->left + i + 1);
|
||||||
|
res.push_back(wr);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (mixResItr->Length() > 3) {
|
||||||
|
for (size_t i = 0; i + 2 < mixResItr->Length(); i++) {
|
||||||
|
string text = EncodeRunesToString(mixResItr->left + i, mixResItr->left + i + 3);
|
||||||
|
|
||||||
|
if (trie_->Find(text) != nullptr) {
|
||||||
|
WordRange wr(mixResItr->left + i, mixResItr->left + i + 2);
|
||||||
|
res.push_back(wr);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
res.push_back(*mixResItr);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
virtual void CutWithSentence(const string& s, RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<string>& res, bool hmm,
|
||||||
|
size_t) const override {
|
||||||
|
std::ignore = s;
|
||||||
|
std::ignore = begin;
|
||||||
|
std::ignore = end;
|
||||||
|
std::ignore = res;
|
||||||
|
std::ignore = hmm;
|
||||||
|
}
|
||||||
|
virtual void CutWithSentence(const string& s, RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, unordered_map<string, KeyWord>& res, bool hmm,
|
||||||
|
size_t) const override {
|
||||||
|
std::ignore = s;
|
||||||
|
std::ignore = begin;
|
||||||
|
std::ignore = end;
|
||||||
|
std::ignore = res;
|
||||||
|
std::ignore = hmm;
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
bool IsAllAscii(const RuneArray& s) const {
|
||||||
|
for (size_t i = 0; i < s.size(); i++) {
|
||||||
|
if (s[i] >= 0x80) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
MixSegment mixSeg_;
|
||||||
|
const DictTrie* trie_;
|
||||||
|
}; // QuerySegment
|
||||||
|
|
||||||
|
} // namespace cppjieba
|
||||||
|
|
|
@ -0,0 +1,94 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include "limonp/Logging.hpp"
|
||||||
|
#include "PreFilter.hpp"
|
||||||
|
#include <cassert>
|
||||||
|
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
|
||||||
|
const char* const SPECIAL_SEPARATORS = " \t\n\xEF\xBC\x8C\xE3\x80\x82";
|
||||||
|
|
||||||
|
using namespace limonp;
|
||||||
|
|
||||||
|
class SegmentBase {
|
||||||
|
public:
|
||||||
|
SegmentBase() {
|
||||||
|
XCHECK(ResetSeparators(SPECIAL_SEPARATORS));
|
||||||
|
}
|
||||||
|
virtual ~SegmentBase() { }
|
||||||
|
|
||||||
|
virtual void Cut(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<WordRange>& res, bool hmm,
|
||||||
|
size_t max_word_len) const = 0;
|
||||||
|
//添加基于sentence的cut方法,减少中间变量的存储与格式转换--jxx20210517
|
||||||
|
virtual void CutWithSentence(const string& s, RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<string>& res, bool hmm,
|
||||||
|
size_t max_word_len) const = 0;
|
||||||
|
virtual void CutWithSentence(const string& s, RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, unordered_map<string, KeyWord>& res, bool hmm,
|
||||||
|
size_t max_word_len) const = 0;
|
||||||
|
//重写CutToStr函数,简化获取vector<string>& words的流程,降低内存占用--jxx20210517
|
||||||
|
void CutToStr(const string& sentence, vector<string>& words, bool hmm = true,
|
||||||
|
size_t max_word_len = MAX_WORD_LENGTH) const {
|
||||||
|
PreFilter pre_filter(symbols_, sentence);
|
||||||
|
words.clear();
|
||||||
|
words.reserve(sentence.size() / 2);//todo 参考源码,参数待定
|
||||||
|
RuneStrArray::const_iterator null_p;
|
||||||
|
WordRange range(null_p, null_p);
|
||||||
|
while (pre_filter.Next(range)) {
|
||||||
|
CutWithSentence(sentence, range.left, range.right, words, hmm, max_word_len);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
void CutToStr(const string& sentence, WordRange range, vector<string>& words, bool hmm = true,
|
||||||
|
size_t max_word_len = MAX_WORD_LENGTH) const {
|
||||||
|
CutWithSentence(sentence, range.left, range.right, words, hmm, max_word_len);
|
||||||
|
}
|
||||||
|
void CutToStr(const string& sentence, WordRange range, unordered_map<string, KeyWord>& words, bool hmm = true,
|
||||||
|
size_t max_word_len = MAX_WORD_LENGTH) const {
|
||||||
|
CutWithSentence(sentence, range.left, range.right, words, hmm, max_word_len);
|
||||||
|
}
|
||||||
|
void CutToWord(const string& sentence, vector<Word>& words, bool hmm = true,
|
||||||
|
size_t max_word_len = MAX_WORD_LENGTH) const {
|
||||||
|
PreFilter pre_filter(symbols_, sentence);
|
||||||
|
vector<WordRange> wrs;
|
||||||
|
wrs.reserve(sentence.size() / 2);
|
||||||
|
|
||||||
|
while (pre_filter.HasNext()) {
|
||||||
|
auto range = pre_filter.Next();
|
||||||
|
Cut(range.left, range.right, wrs, hmm, max_word_len);
|
||||||
|
}
|
||||||
|
|
||||||
|
words.clear();
|
||||||
|
words.reserve(wrs.size());
|
||||||
|
GetWordsFromWordRanges(sentence, wrs, words);
|
||||||
|
wrs.clear();
|
||||||
|
vector<WordRange>().swap(wrs);
|
||||||
|
}
|
||||||
|
|
||||||
|
void CutRuneArray(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<WordRange>& res,
|
||||||
|
bool hmm = true, size_t max_word_len = MAX_WORD_LENGTH) const {
|
||||||
|
Cut(begin, end, res, hmm, max_word_len);
|
||||||
|
}
|
||||||
|
|
||||||
|
bool ResetSeparators(const string& s) {
|
||||||
|
symbols_.clear();
|
||||||
|
RuneStrArray runes;
|
||||||
|
|
||||||
|
if (!DecodeRunesInString(s, runes)) {
|
||||||
|
XLOG(ERROR) << "decode " << s << " failed";
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
for (size_t i = 0; i < runes.size(); i++) {
|
||||||
|
if (!symbols_.insert(runes[i].rune).second) {
|
||||||
|
XLOG(ERROR) << s.substr(runes[i].offset, runes[i].len) << " already exists";
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
protected:
|
||||||
|
unordered_set<Rune> symbols_;
|
||||||
|
}; // class SegmentBase
|
||||||
|
|
||||||
|
} // cppjieba
|
||||||
|
|
|
@ -0,0 +1,21 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include "SegmentBase.hpp"
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
|
||||||
|
class SegmentTagged : public SegmentBase {
|
||||||
|
public:
|
||||||
|
SegmentTagged() {
|
||||||
|
}
|
||||||
|
virtual ~SegmentTagged() {
|
||||||
|
}
|
||||||
|
|
||||||
|
virtual bool Tag(const string& src, vector<pair<string, string> >& res) const = 0;
|
||||||
|
|
||||||
|
virtual const DictTrie* GetDictTrie() const = 0;
|
||||||
|
|
||||||
|
}; // class SegmentTagged
|
||||||
|
|
||||||
|
} // cppjieba
|
||||||
|
|
|
@ -0,0 +1,205 @@
|
||||||
|
|
||||||
|
#include <cmath>
|
||||||
|
#include "Jieba.hpp"
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
using namespace limonp;
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
class TextRankExtractor {
|
||||||
|
public:
|
||||||
|
typedef struct _Word {
|
||||||
|
string word;
|
||||||
|
vector<size_t> offsets;
|
||||||
|
double weight;
|
||||||
|
} Word; // struct Word
|
||||||
|
private:
|
||||||
|
typedef std::map<string, Word> WordMap;
|
||||||
|
|
||||||
|
class WordGraph {
|
||||||
|
private:
|
||||||
|
typedef double Score;
|
||||||
|
typedef string Node;
|
||||||
|
typedef std::set<Node> NodeSet;
|
||||||
|
|
||||||
|
typedef std::map<Node, double> Edges;
|
||||||
|
typedef std::map<Node, Edges> Graph;
|
||||||
|
//typedef std::unordered_map<Node,double> Edges;
|
||||||
|
//typedef std::unordered_map<Node,Edges> Graph;
|
||||||
|
|
||||||
|
double d;
|
||||||
|
Graph graph;
|
||||||
|
NodeSet nodeSet;
|
||||||
|
public:
|
||||||
|
WordGraph(): d(0.85) {};
|
||||||
|
WordGraph(double in_d): d(in_d) {};
|
||||||
|
|
||||||
|
void addEdge(Node start, Node end, double weight) {
|
||||||
|
Edges temp;
|
||||||
|
Edges::iterator gotEdges;
|
||||||
|
nodeSet.insert(start);
|
||||||
|
nodeSet.insert(end);
|
||||||
|
graph[start][end] += weight;
|
||||||
|
graph[end][start] += weight;
|
||||||
|
}
|
||||||
|
|
||||||
|
void rank(WordMap &ws, size_t rankTime = 10) {
|
||||||
|
WordMap outSum;
|
||||||
|
Score wsdef, min_rank, max_rank;
|
||||||
|
|
||||||
|
if (graph.size() == 0) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
wsdef = 1.0 / graph.size();
|
||||||
|
|
||||||
|
for (Graph::iterator edges = graph.begin(); edges != graph.end(); ++edges) {
|
||||||
|
// edges->first start节点;edge->first end节点;edge->second 权重
|
||||||
|
ws[edges->first].word = edges->first;
|
||||||
|
ws[edges->first].weight = wsdef;
|
||||||
|
outSum[edges->first].weight = 0;
|
||||||
|
|
||||||
|
for (Edges::iterator edge = edges->second.begin(); edge != edges->second.end(); ++edge) {
|
||||||
|
outSum[edges->first].weight += edge->second;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
//sort(nodeSet.begin(),nodeSet.end()); 是否需要排序?
|
||||||
|
for (size_t i = 0; i < rankTime; i++) {
|
||||||
|
for (NodeSet::iterator node = nodeSet.begin(); node != nodeSet.end(); node++) {
|
||||||
|
double s = 0;
|
||||||
|
|
||||||
|
for (Edges::iterator edge = graph[*node].begin(); edge != graph[*node].end(); edge++)
|
||||||
|
// edge->first end节点;edge->second 权重
|
||||||
|
{
|
||||||
|
s += edge->second / outSum[edge->first].weight * ws[edge->first].weight;
|
||||||
|
}
|
||||||
|
|
||||||
|
ws[*node].weight = (1 - d) + d * s;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
min_rank = max_rank = ws.begin()->second.weight;
|
||||||
|
|
||||||
|
for (WordMap::iterator i = ws.begin(); i != ws.end(); i ++) {
|
||||||
|
if (i->second.weight < min_rank) {
|
||||||
|
min_rank = i->second.weight;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (i->second.weight > max_rank) {
|
||||||
|
max_rank = i->second.weight;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for (WordMap::iterator i = ws.begin(); i != ws.end(); i ++) {
|
||||||
|
ws[i->first].weight = (i->second.weight - min_rank / 10.0) / (max_rank - min_rank / 10.0);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
public:
|
||||||
|
TextRankExtractor(const DictTrie* dictTrie,
|
||||||
|
const HMMModel* model,
|
||||||
|
const string& stopWordPath)
|
||||||
|
: segment_(dictTrie, model) {
|
||||||
|
LoadStopWordDict(stopWordPath);
|
||||||
|
}
|
||||||
|
TextRankExtractor(const Jieba& jieba, const string& stopWordPath) : segment_(jieba.GetDictTrie(), jieba.GetHMMModel()) {
|
||||||
|
LoadStopWordDict(stopWordPath);
|
||||||
|
}
|
||||||
|
~TextRankExtractor() {
|
||||||
|
}
|
||||||
|
|
||||||
|
void Extract(const string& sentence, vector<string>& keywords, size_t topN) const {
|
||||||
|
vector<Word> topWords;
|
||||||
|
Extract(sentence, topWords, topN);
|
||||||
|
|
||||||
|
for (size_t i = 0; i < topWords.size(); i++) {
|
||||||
|
keywords.push_back(topWords[i].word);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void Extract(const string& sentence, vector<pair<string, double> >& keywords, size_t topN) const {
|
||||||
|
vector<Word> topWords;
|
||||||
|
Extract(sentence, topWords, topN);
|
||||||
|
|
||||||
|
for (size_t i = 0; i < topWords.size(); i++) {
|
||||||
|
keywords.push_back(pair<string, double>(topWords[i].word, topWords[i].weight));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void Extract(const string& sentence, vector<Word>& keywords, size_t topN, size_t span = 5, size_t rankTime = 10) const {
|
||||||
|
vector<string> words;
|
||||||
|
segment_.CutToStr(sentence, words);
|
||||||
|
|
||||||
|
TextRankExtractor::WordGraph graph;
|
||||||
|
WordMap wordmap;
|
||||||
|
size_t offset = 0;
|
||||||
|
|
||||||
|
for (size_t i = 0; i < words.size(); i++) {
|
||||||
|
size_t t = offset;
|
||||||
|
offset += words[i].size();
|
||||||
|
|
||||||
|
if (IsSingleWord(words[i]) || stopWords_.find(words[i]) != stopWords_.end()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
for (size_t j = i + 1, skip = 0; j < i + span + skip && j < words.size(); j++) {
|
||||||
|
if (IsSingleWord(words[j]) || stopWords_.find(words[j]) != stopWords_.end()) {
|
||||||
|
skip++;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
graph.addEdge(words[i], words[j], 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
wordmap[words[i]].offsets.push_back(t);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (offset != sentence.size()) {
|
||||||
|
XLOG(ERROR) << "words illegal";
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
graph.rank(wordmap, rankTime);
|
||||||
|
|
||||||
|
keywords.clear();
|
||||||
|
keywords.reserve(wordmap.size());
|
||||||
|
|
||||||
|
for (WordMap::iterator itr = wordmap.begin(); itr != wordmap.end(); ++itr) {
|
||||||
|
keywords.push_back(itr->second);
|
||||||
|
}
|
||||||
|
|
||||||
|
topN = min(topN, keywords.size());
|
||||||
|
partial_sort(keywords.begin(), keywords.begin() + topN, keywords.end(), Compare);
|
||||||
|
keywords.resize(topN);
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
void LoadStopWordDict(const string& filePath) {
|
||||||
|
ifstream ifs(filePath.c_str());
|
||||||
|
XCHECK(ifs.is_open()) << "open " << filePath << " failed";
|
||||||
|
string line ;
|
||||||
|
|
||||||
|
while (getline(ifs, line)) {
|
||||||
|
stopWords_.insert(line);
|
||||||
|
}
|
||||||
|
|
||||||
|
assert(stopWords_.size());
|
||||||
|
}
|
||||||
|
|
||||||
|
static bool Compare(const Word &x, const Word &y) {
|
||||||
|
return x.weight > y.weight;
|
||||||
|
}
|
||||||
|
|
||||||
|
MixSegment segment_;
|
||||||
|
unordered_set<string> stopWords_;
|
||||||
|
}; // class TextRankExtractor
|
||||||
|
|
||||||
|
inline ostream& operator << (ostream& os, const TextRankExtractor::Word& word) {
|
||||||
|
return os << "{\"word\": \"" << word.word << "\", \"offset\": " << word.offsets << ", \"weight\": " << word.weight <<
|
||||||
|
"}";
|
||||||
|
}
|
||||||
|
} // namespace cppjieba
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,264 @@
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string>
|
||||||
|
#include <vector>
|
||||||
|
#include <ostream>
|
||||||
|
#include "limonp/LocalVector.hpp"
|
||||||
|
#include "limonp/StringUtil.hpp"
|
||||||
|
#include "common-struct.h"
|
||||||
|
|
||||||
|
namespace cppjieba {
|
||||||
|
|
||||||
|
using std::string;
|
||||||
|
using std::vector;
|
||||||
|
|
||||||
|
typedef uint32_t Rune;
|
||||||
|
|
||||||
|
inline std::ostream& operator << (std::ostream& os, const Word& w) {
|
||||||
|
return os << "{\"word\": \"" << w.word << "\", \"offset\": " << w.offset << "}";
|
||||||
|
}
|
||||||
|
|
||||||
|
struct DatMemElem {
|
||||||
|
double weight = 0.0;
|
||||||
|
char tag[8] = {};
|
||||||
|
|
||||||
|
void SetTag(const string & str) {
|
||||||
|
memset(&tag[0], 0, sizeof(tag));
|
||||||
|
strncpy(&tag[0], str.c_str(), std::min(str.size(), sizeof(tag) - 1));
|
||||||
|
}
|
||||||
|
|
||||||
|
string GetTag() const {
|
||||||
|
return &tag[0];
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
struct DatDag {
|
||||||
|
limonp::LocalVector<pair<size_t, const DatMemElem *> > nexts;
|
||||||
|
//double max_weight;
|
||||||
|
//size_t max_next;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct RuneInfo {
|
||||||
|
Rune rune;
|
||||||
|
uint32_t offset;
|
||||||
|
uint32_t len;
|
||||||
|
uint32_t unicode_offset = 0;
|
||||||
|
uint32_t unicode_length = 0;
|
||||||
|
RuneInfo(): rune(0), offset(0), len(0) {
|
||||||
|
}
|
||||||
|
RuneInfo(Rune r, uint32_t o, uint32_t l)
|
||||||
|
: rune(r), offset(o), len(l) {
|
||||||
|
}
|
||||||
|
RuneInfo(Rune r, uint32_t o, uint32_t l, uint32_t unicode_offset, uint32_t unicode_length)
|
||||||
|
: rune(r), offset(o), len(l), unicode_offset(unicode_offset), unicode_length(unicode_length) {
|
||||||
|
}
|
||||||
|
}; // struct RuneInfo
|
||||||
|
|
||||||
|
inline std::ostream& operator << (std::ostream& os, const RuneInfo& r) {
|
||||||
|
return os << "{\"rune\": \"" << r.rune << "\", \"offset\": " << r.offset << ", \"len\": " << r.len << "}";
|
||||||
|
}
|
||||||
|
|
||||||
|
typedef limonp::LocalVector<Rune> RuneArray;
|
||||||
|
typedef limonp::LocalVector<struct RuneInfo> RuneStrArray;
|
||||||
|
|
||||||
|
// [left, right]
|
||||||
|
struct WordRange {
|
||||||
|
RuneStrArray::const_iterator left;
|
||||||
|
RuneStrArray::const_iterator right;
|
||||||
|
WordRange(RuneStrArray::const_iterator l, RuneStrArray::const_iterator r)
|
||||||
|
: left(l), right(r) {
|
||||||
|
}
|
||||||
|
size_t Length() const {
|
||||||
|
return right - left;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool IsAllAscii() const {
|
||||||
|
for (RuneStrArray::const_iterator iter = left; iter <= right; ++iter) {
|
||||||
|
if (iter->rune >= 0x80) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
}; // struct WordRange
|
||||||
|
|
||||||
|
|
||||||
|
inline bool DecodeRunesInString(const string& s, RuneArray& arr) {
|
||||||
|
arr.clear();
|
||||||
|
return limonp::Utf8ToUnicode32(s, arr);
|
||||||
|
}
|
||||||
|
|
||||||
|
inline RuneArray DecodeRunesInString(const string& s) {
|
||||||
|
RuneArray result;
|
||||||
|
DecodeRunesInString(s, result);
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline bool DecodeRunesInString(const string& s, RuneStrArray& runes) {
|
||||||
|
|
||||||
|
uint32_t tmp;
|
||||||
|
uint32_t offset = 0;
|
||||||
|
runes.clear();
|
||||||
|
uint32_t len(0);
|
||||||
|
for (size_t i = 0; i < s.size();) {
|
||||||
|
if (!(s.data()[i] & 0x80)) { // 0xxxxxxx
|
||||||
|
// 7bit, total 7bit
|
||||||
|
tmp = (uint8_t)(s.data()[i]) & 0x7f;
|
||||||
|
i++;
|
||||||
|
len = 1;
|
||||||
|
} else if ((uint8_t)s.data()[i] <= 0xdf && i + 1 < s.size()) { // 110xxxxxx
|
||||||
|
// 5bit, total 5bit
|
||||||
|
tmp = (uint8_t)(s.data()[i]) & 0x1f;
|
||||||
|
|
||||||
|
// 6bit, total 11bit
|
||||||
|
tmp <<= 6;
|
||||||
|
tmp |= (uint8_t)(s.data()[i+1]) & 0x3f;
|
||||||
|
i += 2;
|
||||||
|
len = 2;
|
||||||
|
} else if((uint8_t)s.data()[i] <= 0xef && i + 2 < s.size()) { // 1110xxxxxx
|
||||||
|
// 4bit, total 4bit
|
||||||
|
tmp = (uint8_t)(s.data()[i]) & 0x0f;
|
||||||
|
|
||||||
|
// 6bit, total 10bit
|
||||||
|
tmp <<= 6;
|
||||||
|
tmp |= (uint8_t)(s.data()[i+1]) & 0x3f;
|
||||||
|
|
||||||
|
// 6bit, total 16bit
|
||||||
|
tmp <<= 6;
|
||||||
|
tmp |= (uint8_t)(s.data()[i+2]) & 0x3f;
|
||||||
|
|
||||||
|
i += 3;
|
||||||
|
len = 3;
|
||||||
|
} else if((uint8_t)s.data()[i] <= 0xf7 && i + 3 < s.size()) { // 11110xxxx
|
||||||
|
// 3bit, total 3bit
|
||||||
|
tmp = (uint8_t)(s.data()[i]) & 0x07;
|
||||||
|
|
||||||
|
// 6bit, total 9bit
|
||||||
|
tmp <<= 6;
|
||||||
|
tmp |= (uint8_t)(s.data()[i+1]) & 0x3f;
|
||||||
|
|
||||||
|
// 6bit, total 15bit
|
||||||
|
tmp <<= 6;
|
||||||
|
tmp |= (uint8_t)(s.data()[i+2]) & 0x3f;
|
||||||
|
|
||||||
|
// 6bit, total 21bit
|
||||||
|
tmp <<= 6;
|
||||||
|
tmp |= (uint8_t)(s.data()[i+3]) & 0x3f;
|
||||||
|
|
||||||
|
i += 4;
|
||||||
|
len = 4;
|
||||||
|
} else {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
RuneInfo x(tmp, offset, len, i, 1);
|
||||||
|
runes.push_back(x);
|
||||||
|
offset += len;
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
class RunePtrWrapper {
|
||||||
|
public:
|
||||||
|
const RuneInfo * m_ptr = nullptr;
|
||||||
|
|
||||||
|
public:
|
||||||
|
explicit RunePtrWrapper(const RuneInfo * p) : m_ptr(p) {}
|
||||||
|
|
||||||
|
uint32_t operator *() {
|
||||||
|
return m_ptr->rune;
|
||||||
|
}
|
||||||
|
|
||||||
|
RunePtrWrapper operator ++(int) {
|
||||||
|
m_ptr ++;
|
||||||
|
return RunePtrWrapper(m_ptr);
|
||||||
|
}
|
||||||
|
|
||||||
|
bool operator !=(const RunePtrWrapper & b) const {
|
||||||
|
return this->m_ptr != b.m_ptr;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
inline string EncodeRunesToString(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end) {
|
||||||
|
string str;
|
||||||
|
RunePtrWrapper it_begin(begin), it_end(end);
|
||||||
|
limonp::Unicode32ToUtf8(it_begin, it_end, str);
|
||||||
|
return str;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline void EncodeRunesToString(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, string& str) {
|
||||||
|
RunePtrWrapper it_begin(begin), it_end(end);
|
||||||
|
limonp::Unicode32ToUtf8(it_begin, it_end, str);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
class Unicode32Counter {
|
||||||
|
public :
|
||||||
|
size_t length = 0;
|
||||||
|
void clear() {
|
||||||
|
length = 0;
|
||||||
|
}
|
||||||
|
void push_back(uint32_t) {
|
||||||
|
++length;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
inline size_t Utf8CharNum(const char * str, size_t length) {
|
||||||
|
Unicode32Counter c;
|
||||||
|
|
||||||
|
if (limonp::Utf8ToUnicode32(str, length, c)) {
|
||||||
|
return c.length;
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline size_t Utf8CharNum(const string & str) {
|
||||||
|
return Utf8CharNum(str.data(), str.size());
|
||||||
|
}
|
||||||
|
|
||||||
|
inline bool IsSingleWord(const string& str) {
|
||||||
|
return Utf8CharNum(str) == 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
// [left, right]
|
||||||
|
inline Word GetWordFromRunes(const string& s, RuneStrArray::const_iterator left, RuneStrArray::const_iterator right) {
|
||||||
|
assert(right->offset >= left->offset);
|
||||||
|
uint32_t len = right->offset - left->offset + right->len;
|
||||||
|
uint32_t unicode_length = right->unicode_offset - left->unicode_offset + right->unicode_length;
|
||||||
|
return Word(s.substr(left->offset, len), left->offset, left->unicode_offset, unicode_length);
|
||||||
|
}
|
||||||
|
|
||||||
|
inline string GetStringFromRunes(const string& s, RuneStrArray::const_iterator left, RuneStrArray::const_iterator right) {
|
||||||
|
assert(right->offset >= left->offset);
|
||||||
|
//uint32_t len = right->offset - left->offset + right->len;
|
||||||
|
return s.substr(left->offset, right->offset - left->offset + right->len);
|
||||||
|
}
|
||||||
|
|
||||||
|
inline void GetWordsFromWordRanges(const string& s, const vector<WordRange>& wrs, vector<Word>& words) {
|
||||||
|
for (size_t i = 0; i < wrs.size(); i++) {
|
||||||
|
words.push_back(GetWordFromRunes(s, wrs[i].left, wrs[i].right));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
inline void GetWordsFromWordRanges(const string& s, const vector<WordRange>& wrs, vector<string>& words) {
|
||||||
|
for (size_t i = 0; i < wrs.size(); i++) {
|
||||||
|
words.push_back(GetStringFromRunes(s, wrs[i].left, wrs[i].right));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
inline void GetStringsFromWords(const vector<Word>& words, vector<string>& strs) {
|
||||||
|
strs.resize(words.size());
|
||||||
|
|
||||||
|
for (size_t i = 0; i < words.size(); ++i) {
|
||||||
|
strs[i] = words[i].word;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const size_t MAX_WORD_LENGTH = 512;
|
||||||
|
|
||||||
|
} // namespace cppjieba
|
||||||
|
|
|
@ -0,0 +1,43 @@
|
||||||
|
INCLUDEPATH += $$PWD
|
||||||
|
|
||||||
|
HEADERS += \
|
||||||
|
$$PWD/DictTrie.hpp \
|
||||||
|
$$PWD/IdfTrie.hpp \
|
||||||
|
$$PWD/PinYinTrie.hpp \
|
||||||
|
$$PWD/FullSegment.hpp \
|
||||||
|
$$PWD/HMMModel.hpp \
|
||||||
|
$$PWD/HMMSegment.hpp \
|
||||||
|
$$PWD/Jieba.hpp \
|
||||||
|
$$PWD/KeywordExtractor.hpp \
|
||||||
|
$$PWD/MPSegment.hpp \
|
||||||
|
$$PWD/MixSegment.hpp \
|
||||||
|
$$PWD/PosTagger.hpp \
|
||||||
|
$$PWD/PreFilter.hpp \
|
||||||
|
$$PWD/QuerySegment.hpp \
|
||||||
|
$$PWD/SegmentBase.hpp \
|
||||||
|
$$PWD/SegmentTagged.hpp \
|
||||||
|
$$PWD/TextRankExtractor.hpp \
|
||||||
|
# $$PWD/Trie.hpp \
|
||||||
|
$$PWD/Unicode.hpp \
|
||||||
|
$$PWD/DatTrie.hpp \
|
||||||
|
$$PWD/idf-trie/idf-trie.h \
|
||||||
|
$$PWD/segment-trie/segment-trie.h
|
||||||
|
|
||||||
|
DISTFILES += \
|
||||||
|
dict/README.md \
|
||||||
|
dict/hmm_model.utf8 \
|
||||||
|
dict/idf.utf8 \
|
||||||
|
dict/jieba.dict.utf8 \
|
||||||
|
dict/pos_dict/char_state_tab.utf8 \
|
||||||
|
dict/pos_dict/prob_emit.utf8 \
|
||||||
|
dict/pos_dict/prob_start.utf8 \
|
||||||
|
dict/pos_dict/prob_trans.utf8 \
|
||||||
|
dict/stop_words.utf8 \
|
||||||
|
dict/user.dict.utf8
|
||||||
|
#dict/pinyinWithoutTone.txt \
|
||||||
|
|
||||||
|
include(limonp/limonp.pri)
|
||||||
|
|
||||||
|
SOURCES += \
|
||||||
|
$$PWD/idf-trie/idf-trie.cpp \
|
||||||
|
$$PWD/segment-trie/segment-trie.cpp
|
|
@ -0,0 +1,97 @@
|
||||||
|
/*
|
||||||
|
* Copyright (C) 2022, KylinSoft Co., Ltd.
|
||||||
|
*
|
||||||
|
* This program is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This program is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||||
|
*
|
||||||
|
* Authors: jixiaoxu <jixiaoxu@kylinos.cn>
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
#include "idf-trie.h"
|
||||||
|
|
||||||
|
IdfTrie::IdfTrie(const vector<string> file_paths, string dat_cache_path)
|
||||||
|
: StorageBase<double, false, IdfCacheFileHeader>(file_paths, dat_cache_path)
|
||||||
|
{
|
||||||
|
this->Init();
|
||||||
|
}
|
||||||
|
|
||||||
|
IdfTrie::IdfTrie(string file_path, string dat_cache_path)
|
||||||
|
: StorageBase<double, false, IdfCacheFileHeader>(vector<string>{file_path}, dat_cache_path)
|
||||||
|
{
|
||||||
|
this->Init();
|
||||||
|
}
|
||||||
|
|
||||||
|
void IdfTrie::LoadSourceFile(const string &dat_cache_file, const string &md5)
|
||||||
|
{
|
||||||
|
IdfCacheFileHeader header;
|
||||||
|
assert(sizeof(header.md5_hex) == md5.size());
|
||||||
|
memcpy(&header.md5_hex[0], md5.c_str(), md5.size());
|
||||||
|
|
||||||
|
int offset(0), elements_num(0), write_bytes(0), data_trie_size(0);
|
||||||
|
double idf_sum(0), idf_average(0), tmp(0);
|
||||||
|
string tmp_filepath = string(dat_cache_file) + "_XXXXXX";
|
||||||
|
umask(S_IWGRP | S_IWOTH);
|
||||||
|
const int fd =mkstemp((char *)tmp_filepath.data());
|
||||||
|
assert(fd >= 0);
|
||||||
|
fchmod(fd, 0644);
|
||||||
|
|
||||||
|
write_bytes = write(fd, (const char *)&header, sizeof(IdfCacheFileHeader));
|
||||||
|
|
||||||
|
ifstream ifs(IDF_DICT_PATH);
|
||||||
|
string line;
|
||||||
|
vector<string> buf;
|
||||||
|
|
||||||
|
for (; getline(ifs, line);) {
|
||||||
|
if (limonp::StartsWith(line, "#") or line.empty()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
limonp::Split(line, buf, " ");
|
||||||
|
if (buf.size() != 2)
|
||||||
|
continue;
|
||||||
|
this->Update(buf[0].c_str(), buf[0].size(), elements_num);
|
||||||
|
offset += sizeof(double);
|
||||||
|
elements_num++;
|
||||||
|
tmp = atof(buf[1].c_str());
|
||||||
|
write_bytes += write(fd, &tmp, sizeof(double));
|
||||||
|
idf_sum += tmp;
|
||||||
|
}
|
||||||
|
idf_average = idf_sum / elements_num;
|
||||||
|
write_bytes += write(fd, this->GetDataTrieArray(), this->GetDataTrieTotalSize());
|
||||||
|
|
||||||
|
lseek(fd, sizeof(header.md5_hex), SEEK_SET);
|
||||||
|
write(fd, &elements_num, sizeof(int));
|
||||||
|
write(fd, &offset, sizeof(int));
|
||||||
|
data_trie_size = this->GetDataTrieSize();
|
||||||
|
write(fd, &data_trie_size, sizeof(int));
|
||||||
|
write(fd, &idf_average, sizeof(double));
|
||||||
|
|
||||||
|
close(fd);
|
||||||
|
assert((size_t)write_bytes == sizeof(IdfCacheFileHeader) + offset + this->GetDataTrieTotalSize());
|
||||||
|
|
||||||
|
const auto rename_ret = rename(tmp_filepath.c_str(), dat_cache_file.c_str());
|
||||||
|
assert(0 == rename_ret);
|
||||||
|
}
|
||||||
|
|
||||||
|
double IdfTrie::Find(const string &key) const
|
||||||
|
{
|
||||||
|
int result = this->ExactMatchSearch(key.c_str(), key.size());
|
||||||
|
if (result < 0)
|
||||||
|
return -1;
|
||||||
|
return this->GetElementPtr()[result];
|
||||||
|
}
|
||||||
|
|
||||||
|
double IdfTrie::GetIdfAverage() const
|
||||||
|
{
|
||||||
|
return this->GetCacheFileHeaderPtr()->idf_average;
|
||||||
|
}
|
||||||
|
|
|
@ -0,0 +1,45 @@
|
||||||
|
/*
|
||||||
|
* Copyright (C) 2022, KylinSoft Co., Ltd.
|
||||||
|
*
|
||||||
|
* This program is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This program is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||||
|
*
|
||||||
|
* Authors: jixiaoxu <jixiaoxu@kylinos.cn>
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
#ifndef IdfTrie_H
|
||||||
|
#define IdfTrie_H
|
||||||
|
|
||||||
|
#include "storage-base.hpp"
|
||||||
|
|
||||||
|
const char * const IDF_DICT_PATH = "/usr/share/ukui-search/res/dict/idf.utf8";
|
||||||
|
|
||||||
|
struct IdfCacheFileHeader : CacheFileHeaderBase
|
||||||
|
{
|
||||||
|
double idf_average = 0;
|
||||||
|
};
|
||||||
|
|
||||||
|
class IdfTrie : public StorageBase<double, false, IdfCacheFileHeader>
|
||||||
|
{
|
||||||
|
public:
|
||||||
|
IdfTrie(const vector<string> file_paths, string dat_cache_path);
|
||||||
|
IdfTrie(string file_path, string dat_cache_path);
|
||||||
|
void LoadSourceFile(const string &dat_cache_file, const string &md5) override;
|
||||||
|
double Find(const string &key) const;
|
||||||
|
double GetIdfAverage() const;
|
||||||
|
|
||||||
|
private:
|
||||||
|
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif // IdfTrie_H
|
|
@ -0,0 +1,70 @@
|
||||||
|
/************************************
|
||||||
|
* file enc : ascii
|
||||||
|
* author : wuyanyi09@gmail.com
|
||||||
|
************************************/
|
||||||
|
|
||||||
|
#ifndef LIMONP_ARGV_FUNCTS_H
|
||||||
|
#define LIMONP_ARGV_FUNCTS_H
|
||||||
|
|
||||||
|
#include <set>
|
||||||
|
#include <sstream>
|
||||||
|
#include "StringUtil.hpp"
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
class ArgvContext {
|
||||||
|
public :
|
||||||
|
ArgvContext(int argc, const char* const * argv) {
|
||||||
|
for(int i = 0; i < argc; i++) {
|
||||||
|
if(StartsWith(argv[i], "-")) {
|
||||||
|
if(i + 1 < argc && !StartsWith(argv[i + 1], "-")) {
|
||||||
|
mpss_[argv[i]] = argv[i+1];
|
||||||
|
i++;
|
||||||
|
} else {
|
||||||
|
sset_.insert(argv[i]);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
args_.push_back(argv[i]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
~ArgvContext() {
|
||||||
|
}
|
||||||
|
|
||||||
|
friend ostream& operator << (ostream& os, const ArgvContext& args);
|
||||||
|
string operator [](size_t i) const {
|
||||||
|
if(i < args_.size()) {
|
||||||
|
return args_[i];
|
||||||
|
}
|
||||||
|
return "";
|
||||||
|
}
|
||||||
|
string operator [](const string& key) const {
|
||||||
|
map<string, string>::const_iterator it = mpss_.find(key);
|
||||||
|
if(it != mpss_.end()) {
|
||||||
|
return it->second;
|
||||||
|
}
|
||||||
|
return "";
|
||||||
|
}
|
||||||
|
|
||||||
|
bool HasKey(const string& key) const {
|
||||||
|
if(mpss_.find(key) != mpss_.end() || sset_.find(key) != sset_.end()) {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
vector<string> args_;
|
||||||
|
map<string, string> mpss_;
|
||||||
|
set<string> sset_;
|
||||||
|
}; // class ArgvContext
|
||||||
|
|
||||||
|
inline ostream& operator << (ostream& os, const ArgvContext& args) {
|
||||||
|
return os<<args.args_<<args.mpss_<<args.sset_;
|
||||||
|
}
|
||||||
|
|
||||||
|
} // namespace limonp
|
||||||
|
|
||||||
|
#endif
|
|
@ -0,0 +1,49 @@
|
||||||
|
#ifndef LIMONP_BLOCKINGQUEUE_HPP
|
||||||
|
#define LIMONP_BLOCKINGQUEUE_HPP
|
||||||
|
|
||||||
|
#include <queue>
|
||||||
|
#include "Condition.hpp"
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
template<class T>
|
||||||
|
class BlockingQueue: NonCopyable {
|
||||||
|
public:
|
||||||
|
BlockingQueue()
|
||||||
|
: mutex_(), notEmpty_(mutex_), queue_() {
|
||||||
|
}
|
||||||
|
|
||||||
|
void Push(const T& x) {
|
||||||
|
MutexLockGuard lock(mutex_);
|
||||||
|
queue_.push(x);
|
||||||
|
notEmpty_.Notify(); // Wait morphing saves us
|
||||||
|
}
|
||||||
|
|
||||||
|
T Pop() {
|
||||||
|
MutexLockGuard lock(mutex_);
|
||||||
|
// always use a while-loop, due to spurious wakeup
|
||||||
|
while (queue_.empty()) {
|
||||||
|
notEmpty_.Wait();
|
||||||
|
}
|
||||||
|
assert(!queue_.empty());
|
||||||
|
T front(queue_.front());
|
||||||
|
queue_.pop();
|
||||||
|
return front;
|
||||||
|
}
|
||||||
|
|
||||||
|
size_t Size() const {
|
||||||
|
MutexLockGuard lock(mutex_);
|
||||||
|
return queue_.size();
|
||||||
|
}
|
||||||
|
bool Empty() const {
|
||||||
|
return Size() == 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
mutable MutexLock mutex_;
|
||||||
|
Condition notEmpty_;
|
||||||
|
std::queue<T> queue_;
|
||||||
|
}; // class BlockingQueue
|
||||||
|
|
||||||
|
} // namespace limonp
|
||||||
|
|
||||||
|
#endif // LIMONP_BLOCKINGQUEUE_HPP
|
|
@ -0,0 +1,67 @@
|
||||||
|
#ifndef LIMONP_BOUNDED_BLOCKING_QUEUE_HPP
|
||||||
|
#define LIMONP_BOUNDED_BLOCKING_QUEUE_HPP
|
||||||
|
|
||||||
|
#include "BoundedQueue.hpp"
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
|
||||||
|
template<typename T>
|
||||||
|
class BoundedBlockingQueue : NonCopyable {
|
||||||
|
public:
|
||||||
|
explicit BoundedBlockingQueue(size_t maxSize)
|
||||||
|
: mutex_(),
|
||||||
|
notEmpty_(mutex_),
|
||||||
|
notFull_(mutex_),
|
||||||
|
queue_(maxSize) {
|
||||||
|
}
|
||||||
|
|
||||||
|
void Push(const T& x) {
|
||||||
|
MutexLockGuard lock(mutex_);
|
||||||
|
while (queue_.Full()) {
|
||||||
|
notFull_.Wait();
|
||||||
|
}
|
||||||
|
assert(!queue_.Full());
|
||||||
|
queue_.Push(x);
|
||||||
|
notEmpty_.Notify();
|
||||||
|
}
|
||||||
|
|
||||||
|
T Pop() {
|
||||||
|
MutexLockGuard lock(mutex_);
|
||||||
|
while (queue_.Empty()) {
|
||||||
|
notEmpty_.Wait();
|
||||||
|
}
|
||||||
|
assert(!queue_.Empty());
|
||||||
|
T res = queue_.Pop();
|
||||||
|
notFull_.Notify();
|
||||||
|
return res;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool Empty() const {
|
||||||
|
MutexLockGuard lock(mutex_);
|
||||||
|
return queue_.Empty();
|
||||||
|
}
|
||||||
|
|
||||||
|
bool Full() const {
|
||||||
|
MutexLockGuard lock(mutex_);
|
||||||
|
return queue_.Full();
|
||||||
|
}
|
||||||
|
|
||||||
|
size_t size() const {
|
||||||
|
MutexLockGuard lock(mutex_);
|
||||||
|
return queue_.size();
|
||||||
|
}
|
||||||
|
|
||||||
|
size_t capacity() const {
|
||||||
|
return queue_.capacity();
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
mutable MutexLock mutex_;
|
||||||
|
Condition notEmpty_;
|
||||||
|
Condition notFull_;
|
||||||
|
BoundedQueue<T> queue_;
|
||||||
|
}; // class BoundedBlockingQueue
|
||||||
|
|
||||||
|
} // namespace limonp
|
||||||
|
|
||||||
|
#endif // LIMONP_BOUNDED_BLOCKING_QUEUE_HPP
|
|
@ -0,0 +1,65 @@
|
||||||
|
#ifndef LIMONP_BOUNDED_QUEUE_HPP
|
||||||
|
#define LIMONP_BOUNDED_QUEUE_HPP
|
||||||
|
|
||||||
|
#include <vector>
|
||||||
|
#include <fstream>
|
||||||
|
#include <cassert>
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
using namespace std;
|
||||||
|
template<class T>
|
||||||
|
class BoundedQueue {
|
||||||
|
public:
|
||||||
|
explicit BoundedQueue(size_t capacity): capacity_(capacity), circular_buffer_(capacity) {
|
||||||
|
head_ = 0;
|
||||||
|
tail_ = 0;
|
||||||
|
size_ = 0;
|
||||||
|
assert(capacity_);
|
||||||
|
}
|
||||||
|
~BoundedQueue() {
|
||||||
|
}
|
||||||
|
|
||||||
|
void Clear() {
|
||||||
|
head_ = 0;
|
||||||
|
tail_ = 0;
|
||||||
|
size_ = 0;
|
||||||
|
}
|
||||||
|
bool Empty() const {
|
||||||
|
return !size_;
|
||||||
|
}
|
||||||
|
bool Full() const {
|
||||||
|
return capacity_ == size_;
|
||||||
|
}
|
||||||
|
size_t Size() const {
|
||||||
|
return size_;
|
||||||
|
}
|
||||||
|
size_t Capacity() const {
|
||||||
|
return capacity_;
|
||||||
|
}
|
||||||
|
|
||||||
|
void Push(const T& t) {
|
||||||
|
assert(!Full());
|
||||||
|
circular_buffer_[tail_] = t;
|
||||||
|
tail_ = (tail_ + 1) % capacity_;
|
||||||
|
size_ ++;
|
||||||
|
}
|
||||||
|
|
||||||
|
T Pop() {
|
||||||
|
assert(!Empty());
|
||||||
|
size_t oldPos = head_;
|
||||||
|
head_ = (head_ + 1) % capacity_;
|
||||||
|
size_ --;
|
||||||
|
return circular_buffer_[oldPos];
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
size_t head_;
|
||||||
|
size_t tail_;
|
||||||
|
size_t size_;
|
||||||
|
const size_t capacity_;
|
||||||
|
vector<T> circular_buffer_;
|
||||||
|
|
||||||
|
}; // class BoundedQueue
|
||||||
|
} // namespace limonp
|
||||||
|
|
||||||
|
#endif
|
|
@ -0,0 +1,206 @@
|
||||||
|
#ifndef LIMONP_CLOSURE_HPP
|
||||||
|
#define LIMONP_CLOSURE_HPP
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
|
||||||
|
class ClosureInterface {
|
||||||
|
public:
|
||||||
|
virtual ~ClosureInterface() {
|
||||||
|
}
|
||||||
|
virtual void Run() = 0;
|
||||||
|
};
|
||||||
|
|
||||||
|
template <class Funct>
|
||||||
|
class Closure0: public ClosureInterface {
|
||||||
|
public:
|
||||||
|
Closure0(Funct fun) {
|
||||||
|
fun_ = fun;
|
||||||
|
}
|
||||||
|
virtual ~Closure0() {
|
||||||
|
}
|
||||||
|
virtual void Run() {
|
||||||
|
(*fun_)();
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
Funct fun_;
|
||||||
|
};
|
||||||
|
|
||||||
|
template <class Funct, class Arg1>
|
||||||
|
class Closure1: public ClosureInterface {
|
||||||
|
public:
|
||||||
|
Closure1(Funct fun, Arg1 arg1) {
|
||||||
|
fun_ = fun;
|
||||||
|
arg1_ = arg1;
|
||||||
|
}
|
||||||
|
virtual ~Closure1() {
|
||||||
|
}
|
||||||
|
virtual void Run() {
|
||||||
|
(*fun_)(arg1_);
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
Funct fun_;
|
||||||
|
Arg1 arg1_;
|
||||||
|
};
|
||||||
|
|
||||||
|
template <class Funct, class Arg1, class Arg2>
|
||||||
|
class Closure2: public ClosureInterface {
|
||||||
|
public:
|
||||||
|
Closure2(Funct fun, Arg1 arg1, Arg2 arg2) {
|
||||||
|
fun_ = fun;
|
||||||
|
arg1_ = arg1;
|
||||||
|
arg2_ = arg2;
|
||||||
|
}
|
||||||
|
virtual ~Closure2() {
|
||||||
|
}
|
||||||
|
virtual void Run() {
|
||||||
|
(*fun_)(arg1_, arg2_);
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
Funct fun_;
|
||||||
|
Arg1 arg1_;
|
||||||
|
Arg2 arg2_;
|
||||||
|
};
|
||||||
|
|
||||||
|
template <class Funct, class Arg1, class Arg2, class Arg3>
|
||||||
|
class Closure3: public ClosureInterface {
|
||||||
|
public:
|
||||||
|
Closure3(Funct fun, Arg1 arg1, Arg2 arg2, Arg3 arg3) {
|
||||||
|
fun_ = fun;
|
||||||
|
arg1_ = arg1;
|
||||||
|
arg2_ = arg2;
|
||||||
|
arg3_ = arg3;
|
||||||
|
}
|
||||||
|
virtual ~Closure3() {
|
||||||
|
}
|
||||||
|
virtual void Run() {
|
||||||
|
(*fun_)(arg1_, arg2_, arg3_);
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
Funct fun_;
|
||||||
|
Arg1 arg1_;
|
||||||
|
Arg2 arg2_;
|
||||||
|
Arg3 arg3_;
|
||||||
|
};
|
||||||
|
|
||||||
|
template <class Obj, class Funct>
|
||||||
|
class ObjClosure0: public ClosureInterface {
|
||||||
|
public:
|
||||||
|
ObjClosure0(Obj* p, Funct fun) {
|
||||||
|
p_ = p;
|
||||||
|
fun_ = fun;
|
||||||
|
}
|
||||||
|
virtual ~ObjClosure0() {
|
||||||
|
}
|
||||||
|
virtual void Run() {
|
||||||
|
(p_->*fun_)();
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
Obj* p_;
|
||||||
|
Funct fun_;
|
||||||
|
};
|
||||||
|
|
||||||
|
template <class Obj, class Funct, class Arg1>
|
||||||
|
class ObjClosure1: public ClosureInterface {
|
||||||
|
public:
|
||||||
|
ObjClosure1(Obj* p, Funct fun, Arg1 arg1) {
|
||||||
|
p_ = p;
|
||||||
|
fun_ = fun;
|
||||||
|
arg1_ = arg1;
|
||||||
|
}
|
||||||
|
virtual ~ObjClosure1() {
|
||||||
|
}
|
||||||
|
virtual void Run() {
|
||||||
|
(p_->*fun_)(arg1_);
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
Obj* p_;
|
||||||
|
Funct fun_;
|
||||||
|
Arg1 arg1_;
|
||||||
|
};
|
||||||
|
|
||||||
|
template <class Obj, class Funct, class Arg1, class Arg2>
|
||||||
|
class ObjClosure2: public ClosureInterface {
|
||||||
|
public:
|
||||||
|
ObjClosure2(Obj* p, Funct fun, Arg1 arg1, Arg2 arg2) {
|
||||||
|
p_ = p;
|
||||||
|
fun_ = fun;
|
||||||
|
arg1_ = arg1;
|
||||||
|
arg2_ = arg2;
|
||||||
|
}
|
||||||
|
virtual ~ObjClosure2() {
|
||||||
|
}
|
||||||
|
virtual void Run() {
|
||||||
|
(p_->*fun_)(arg1_, arg2_);
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
Obj* p_;
|
||||||
|
Funct fun_;
|
||||||
|
Arg1 arg1_;
|
||||||
|
Arg2 arg2_;
|
||||||
|
};
|
||||||
|
template <class Obj, class Funct, class Arg1, class Arg2, class Arg3>
|
||||||
|
class ObjClosure3: public ClosureInterface {
|
||||||
|
public:
|
||||||
|
ObjClosure3(Obj* p, Funct fun, Arg1 arg1, Arg2 arg2, Arg3 arg3) {
|
||||||
|
p_ = p;
|
||||||
|
fun_ = fun;
|
||||||
|
arg1_ = arg1;
|
||||||
|
arg2_ = arg2;
|
||||||
|
arg3_ = arg3;
|
||||||
|
}
|
||||||
|
virtual ~ObjClosure3() {
|
||||||
|
}
|
||||||
|
virtual void Run() {
|
||||||
|
(p_->*fun_)(arg1_, arg2_, arg3_);
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
Obj* p_;
|
||||||
|
Funct fun_;
|
||||||
|
Arg1 arg1_;
|
||||||
|
Arg2 arg2_;
|
||||||
|
Arg3 arg3_;
|
||||||
|
};
|
||||||
|
|
||||||
|
template<class R>
|
||||||
|
ClosureInterface* NewClosure(R (*fun)()) {
|
||||||
|
return new Closure0<R (*)()>(fun);
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class R, class Arg1>
|
||||||
|
ClosureInterface* NewClosure(R (*fun)(Arg1), Arg1 arg1) {
|
||||||
|
return new Closure1<R (*)(Arg1), Arg1>(fun, arg1);
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class R, class Arg1, class Arg2>
|
||||||
|
ClosureInterface* NewClosure(R (*fun)(Arg1, Arg2), Arg1 arg1, Arg2 arg2) {
|
||||||
|
return new Closure2<R (*)(Arg1, Arg2), Arg1, Arg2>(fun, arg1, arg2);
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class R, class Arg1, class Arg2, class Arg3>
|
||||||
|
ClosureInterface* NewClosure(R (*fun)(Arg1, Arg2, Arg3), Arg1 arg1, Arg2 arg2, Arg3 arg3) {
|
||||||
|
return new Closure3<R (*)(Arg1, Arg2, Arg3), Arg1, Arg2, Arg3>(fun, arg1, arg2, arg3);
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class R, class Obj>
|
||||||
|
ClosureInterface* NewClosure(Obj* obj, R (Obj::* fun)()) {
|
||||||
|
return new ObjClosure0<Obj, R (Obj::* )()>(obj, fun);
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class R, class Obj, class Arg1>
|
||||||
|
ClosureInterface* NewClosure(Obj* obj, R (Obj::* fun)(Arg1), Arg1 arg1) {
|
||||||
|
return new ObjClosure1<Obj, R (Obj::* )(Arg1), Arg1>(obj, fun, arg1);
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class R, class Obj, class Arg1, class Arg2>
|
||||||
|
ClosureInterface* NewClosure(Obj* obj, R (Obj::* fun)(Arg1, Arg2), Arg1 arg1, Arg2 arg2) {
|
||||||
|
return new ObjClosure2<Obj, R (Obj::*)(Arg1, Arg2), Arg1, Arg2>(obj, fun, arg1, arg2);
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class R, class Obj, class Arg1, class Arg2, class Arg3>
|
||||||
|
ClosureInterface* NewClosure(Obj* obj, R (Obj::* fun)(Arg1, Arg2, Arg3), Arg1 arg1, Arg2 arg2, Arg3 arg3) {
|
||||||
|
return new ObjClosure3<Obj, R (Obj::*)(Arg1, Arg2, Arg3), Arg1, Arg2, Arg3>(obj, fun, arg1, arg2, arg3);
|
||||||
|
}
|
||||||
|
|
||||||
|
} // namespace limonp
|
||||||
|
|
||||||
|
#endif // LIMONP_CLOSURE_HPP
|
|
@ -0,0 +1,31 @@
|
||||||
|
#ifndef LIMONP_COLOR_PRINT_HPP
|
||||||
|
#define LIMONP_COLOR_PRINT_HPP
|
||||||
|
|
||||||
|
#include <string>
|
||||||
|
#include <stdarg.h>
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
|
||||||
|
using std::string;
|
||||||
|
|
||||||
|
enum Color {
|
||||||
|
BLACK = 30,
|
||||||
|
RED,
|
||||||
|
GREEN,
|
||||||
|
YELLOW,
|
||||||
|
BLUE,
|
||||||
|
PURPLE
|
||||||
|
}; // enum Color
|
||||||
|
|
||||||
|
static void ColorPrintln(enum Color color, const char * fmt, ...) {
|
||||||
|
va_list ap;
|
||||||
|
printf("\033[0;%dm", color);
|
||||||
|
va_start(ap, fmt);
|
||||||
|
vprintf(fmt, ap);
|
||||||
|
va_end(ap);
|
||||||
|
printf("\033[0m\n"); // if not \n , in some situation , the next lines will be set the same color unexpectedly
|
||||||
|
}
|
||||||
|
|
||||||
|
} // namespace limonp
|
||||||
|
|
||||||
|
#endif // LIMONP_COLOR_PRINT_HPP
|
|
@ -0,0 +1,38 @@
|
||||||
|
#ifndef LIMONP_CONDITION_HPP
|
||||||
|
#define LIMONP_CONDITION_HPP
|
||||||
|
|
||||||
|
#include "MutexLock.hpp"
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
|
||||||
|
class Condition : NonCopyable {
|
||||||
|
public:
|
||||||
|
explicit Condition(MutexLock& mutex)
|
||||||
|
: mutex_(mutex) {
|
||||||
|
XCHECK(!pthread_cond_init(&pcond_, NULL));
|
||||||
|
}
|
||||||
|
|
||||||
|
~Condition() {
|
||||||
|
XCHECK(!pthread_cond_destroy(&pcond_));
|
||||||
|
}
|
||||||
|
|
||||||
|
void Wait() {
|
||||||
|
XCHECK(!pthread_cond_wait(&pcond_, mutex_.GetPthreadMutex()));
|
||||||
|
}
|
||||||
|
|
||||||
|
void Notify() {
|
||||||
|
XCHECK(!pthread_cond_signal(&pcond_));
|
||||||
|
}
|
||||||
|
|
||||||
|
void NotifyAll() {
|
||||||
|
XCHECK(!pthread_cond_broadcast(&pcond_));
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
MutexLock& mutex_;
|
||||||
|
pthread_cond_t pcond_;
|
||||||
|
}; // class Condition
|
||||||
|
|
||||||
|
} // namespace limonp
|
||||||
|
|
||||||
|
#endif // LIMONP_CONDITION_HPP
|
|
@ -0,0 +1,103 @@
|
||||||
|
/************************************
|
||||||
|
* file enc : utf8
|
||||||
|
* author : wuyanyi09@gmail.com
|
||||||
|
************************************/
|
||||||
|
#ifndef LIMONP_CONFIG_H
|
||||||
|
#define LIMONP_CONFIG_H
|
||||||
|
|
||||||
|
#include <map>
|
||||||
|
#include <fstream>
|
||||||
|
#include <iostream>
|
||||||
|
#include <assert.h>
|
||||||
|
#include "StringUtil.hpp"
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
class Config {
|
||||||
|
public:
|
||||||
|
explicit Config(const string& filePath) {
|
||||||
|
LoadFile(filePath);
|
||||||
|
}
|
||||||
|
|
||||||
|
operator bool () {
|
||||||
|
return !map_.empty();
|
||||||
|
}
|
||||||
|
|
||||||
|
string Get(const string& key, const string& defaultvalue) const {
|
||||||
|
map<string, string>::const_iterator it = map_.find(key);
|
||||||
|
if(map_.end() != it) {
|
||||||
|
return it->second;
|
||||||
|
}
|
||||||
|
return defaultvalue;
|
||||||
|
}
|
||||||
|
int Get(const string& key, int defaultvalue) const {
|
||||||
|
string str = Get(key, "");
|
||||||
|
if("" == str) {
|
||||||
|
return defaultvalue;
|
||||||
|
}
|
||||||
|
return atoi(str.c_str());
|
||||||
|
}
|
||||||
|
const char* operator [] (const char* key) const {
|
||||||
|
if(NULL == key) {
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
map<string, string>::const_iterator it = map_.find(key);
|
||||||
|
if(map_.end() != it) {
|
||||||
|
return it->second.c_str();
|
||||||
|
}
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
string GetConfigInfo() const {
|
||||||
|
string res;
|
||||||
|
res << *this;
|
||||||
|
return res;
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
void LoadFile(const string& filePath) {
|
||||||
|
ifstream ifs(filePath.c_str());
|
||||||
|
assert(ifs);
|
||||||
|
string line;
|
||||||
|
vector<string> vecBuf;
|
||||||
|
size_t lineno = 0;
|
||||||
|
while(getline(ifs, line)) {
|
||||||
|
lineno ++;
|
||||||
|
Trim(line);
|
||||||
|
if(line.empty() || StartsWith(line, "#")) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
vecBuf.clear();
|
||||||
|
Split(line, vecBuf, "=");
|
||||||
|
if(2 != vecBuf.size()) {
|
||||||
|
fprintf(stderr, "line[%s] illegal.\n", line.c_str());
|
||||||
|
assert(false);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
string& key = vecBuf[0];
|
||||||
|
string& value = vecBuf[1];
|
||||||
|
Trim(key);
|
||||||
|
Trim(value);
|
||||||
|
if(!map_.insert(make_pair(key, value)).second) {
|
||||||
|
fprintf(stderr, "key[%s] already exits.\n", key.c_str());
|
||||||
|
assert(false);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
ifs.close();
|
||||||
|
}
|
||||||
|
|
||||||
|
friend ostream& operator << (ostream& os, const Config& config);
|
||||||
|
|
||||||
|
map<string, string> map_;
|
||||||
|
}; // class Config
|
||||||
|
|
||||||
|
inline ostream& operator << (ostream& os, const Config& config) {
|
||||||
|
return os << config.map_;
|
||||||
|
}
|
||||||
|
|
||||||
|
} // namespace limonp
|
||||||
|
|
||||||
|
#endif // LIMONP_CONFIG_H
|
|
@ -0,0 +1,74 @@
|
||||||
|
#ifndef LIMONP_FILELOCK_HPP
|
||||||
|
#define LIMONP_FILELOCK_HPP
|
||||||
|
|
||||||
|
#include <unistd.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <fcntl.h>
|
||||||
|
#include <errno.h>
|
||||||
|
#include <string>
|
||||||
|
#include <string.h>
|
||||||
|
#include <assert.h>
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
|
||||||
|
using std::string;
|
||||||
|
|
||||||
|
class FileLock {
|
||||||
|
public:
|
||||||
|
FileLock() : fd_(-1), ok_(true) {
|
||||||
|
}
|
||||||
|
~FileLock() {
|
||||||
|
if(fd_ > 0) {
|
||||||
|
Close();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
void Open(const string& fname) {
|
||||||
|
assert(fd_ == -1);
|
||||||
|
fd_ = open(fname.c_str(), O_RDWR | O_CREAT, 0644);
|
||||||
|
if(fd_ < 0) {
|
||||||
|
ok_ = false;
|
||||||
|
err_ = strerror(errno);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
void Close() {
|
||||||
|
::close(fd_);
|
||||||
|
}
|
||||||
|
void Lock() {
|
||||||
|
if(LockOrUnlock(fd_, true) < 0) {
|
||||||
|
ok_ = false;
|
||||||
|
err_ = strerror(errno);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
void UnLock() {
|
||||||
|
if(LockOrUnlock(fd_, false) < 0) {
|
||||||
|
ok_ = false;
|
||||||
|
err_ = strerror(errno);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
bool Ok() const {
|
||||||
|
return ok_;
|
||||||
|
}
|
||||||
|
string Error() const {
|
||||||
|
return err_;
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
static int LockOrUnlock(int fd, bool lock) {
|
||||||
|
errno = 0;
|
||||||
|
struct flock f;
|
||||||
|
memset(&f, 0, sizeof(f));
|
||||||
|
f.l_type = (lock ? F_WRLCK : F_UNLCK);
|
||||||
|
f.l_whence = SEEK_SET;
|
||||||
|
f.l_start = 0;
|
||||||
|
f.l_len = 0; // Lock/unlock entire file
|
||||||
|
return fcntl(fd, F_SETLK, &f);
|
||||||
|
}
|
||||||
|
|
||||||
|
int fd_;
|
||||||
|
bool ok_;
|
||||||
|
string err_;
|
||||||
|
}; // class FileLock
|
||||||
|
|
||||||
|
}// namespace limonp
|
||||||
|
|
||||||
|
#endif // LIMONP_FILELOCK_HPP
|
|
@ -0,0 +1,7 @@
|
||||||
|
#ifndef LIMONP_FORCE_PUBLIC_H
|
||||||
|
#define LIMONP_FORCE_PUBLIC_H
|
||||||
|
|
||||||
|
#define private public
|
||||||
|
#define protected public
|
||||||
|
|
||||||
|
#endif // LIMONP_FORCE_PUBLIC_H
|
|
@ -0,0 +1,142 @@
|
||||||
|
#ifndef LIMONP_LOCAL_VECTOR_HPP
|
||||||
|
#define LIMONP_LOCAL_VECTOR_HPP
|
||||||
|
|
||||||
|
#include <iostream>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <assert.h>
|
||||||
|
#include <string.h>
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
using namespace std;
|
||||||
|
/*
|
||||||
|
* LocalVector<T> : T must be primitive type (char , int, size_t), if T is struct or class, LocalVector<T> may be dangerous..
|
||||||
|
* LocalVector<T> is simple and not well-tested.
|
||||||
|
*/
|
||||||
|
const size_t LOCAL_VECTOR_BUFFER_SIZE = 16;
|
||||||
|
template <class T>
|
||||||
|
class LocalVector {
|
||||||
|
public:
|
||||||
|
typedef const T* const_iterator ;
|
||||||
|
typedef T value_type;
|
||||||
|
typedef size_t size_type;
|
||||||
|
private:
|
||||||
|
T buffer_[LOCAL_VECTOR_BUFFER_SIZE];
|
||||||
|
T * ptr_;
|
||||||
|
size_t size_;
|
||||||
|
size_t capacity_;
|
||||||
|
public:
|
||||||
|
LocalVector() {
|
||||||
|
init_();
|
||||||
|
};
|
||||||
|
LocalVector(const LocalVector<T>& vec) {
|
||||||
|
init_();
|
||||||
|
*this = vec;
|
||||||
|
}
|
||||||
|
LocalVector(const_iterator begin, const_iterator end) { // TODO: make it faster
|
||||||
|
init_();
|
||||||
|
while(begin != end) {
|
||||||
|
push_back(*begin++);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
LocalVector(size_t size, const T& t) { // TODO: make it faster
|
||||||
|
init_();
|
||||||
|
while(size--) {
|
||||||
|
push_back(t);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
~LocalVector() {
|
||||||
|
if(ptr_ != buffer_) {
|
||||||
|
free(ptr_);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
public:
|
||||||
|
LocalVector<T>& operator = (const LocalVector<T>& vec) {
|
||||||
|
if(this == &vec){
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
clear();
|
||||||
|
size_ = vec.size();
|
||||||
|
capacity_ = vec.capacity();
|
||||||
|
if(vec.buffer_ == vec.ptr_) {
|
||||||
|
memcpy(buffer_, vec.buffer_, sizeof(T) * size_);
|
||||||
|
ptr_ = buffer_;
|
||||||
|
} else {
|
||||||
|
ptr_ = (T*) malloc(vec.capacity() * sizeof(T));
|
||||||
|
assert(ptr_);
|
||||||
|
memcpy(ptr_, vec.ptr_, vec.size() * sizeof(T));
|
||||||
|
}
|
||||||
|
return *this;
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
void init_() {
|
||||||
|
ptr_ = buffer_;
|
||||||
|
size_ = 0;
|
||||||
|
capacity_ = LOCAL_VECTOR_BUFFER_SIZE;
|
||||||
|
}
|
||||||
|
public:
|
||||||
|
T& operator [] (size_t i) {
|
||||||
|
return ptr_[i];
|
||||||
|
}
|
||||||
|
const T& operator [] (size_t i) const {
|
||||||
|
return ptr_[i];
|
||||||
|
}
|
||||||
|
void push_back(const T& t) {
|
||||||
|
if(size_ == capacity_) {
|
||||||
|
assert(capacity_);
|
||||||
|
reserve(capacity_ * 2);
|
||||||
|
}
|
||||||
|
ptr_[size_ ++ ] = t;
|
||||||
|
}
|
||||||
|
void reserve(size_t size) {
|
||||||
|
if(size <= capacity_) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
T * next = (T*)malloc(sizeof(T) * size);
|
||||||
|
assert(next);
|
||||||
|
T * old = ptr_;
|
||||||
|
ptr_ = next;
|
||||||
|
memcpy(ptr_, old, sizeof(T) * capacity_);
|
||||||
|
capacity_ = size;
|
||||||
|
if(old != buffer_) {
|
||||||
|
free(old);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
bool empty() const {
|
||||||
|
return 0 == size();
|
||||||
|
}
|
||||||
|
size_t size() const {
|
||||||
|
return size_;
|
||||||
|
}
|
||||||
|
size_t capacity() const {
|
||||||
|
return capacity_;
|
||||||
|
}
|
||||||
|
const_iterator begin() const {
|
||||||
|
return ptr_;
|
||||||
|
}
|
||||||
|
const_iterator end() const {
|
||||||
|
return ptr_ + size_;
|
||||||
|
}
|
||||||
|
void clear() {
|
||||||
|
if(ptr_ != buffer_) {
|
||||||
|
free(ptr_);
|
||||||
|
}
|
||||||
|
init_();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
template <class T>
|
||||||
|
ostream & operator << (ostream& os, const LocalVector<T>& vec) {
|
||||||
|
if(vec.empty()) {
|
||||||
|
return os << "[]";
|
||||||
|
}
|
||||||
|
os<<"[\""<<vec[0];
|
||||||
|
for(size_t i = 1; i < vec.size(); i++) {
|
||||||
|
os<<"\", \""<<vec[i];
|
||||||
|
}
|
||||||
|
os<<"\"]";
|
||||||
|
return os;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif
|
|
@ -0,0 +1,77 @@
|
||||||
|
#ifndef LIMONP_LOGGING_HPP
|
||||||
|
#define LIMONP_LOGGING_HPP
|
||||||
|
|
||||||
|
#include <sstream>
|
||||||
|
#include <iostream>
|
||||||
|
#include <cassert>
|
||||||
|
#include <cstdlib>
|
||||||
|
#include <ctime>
|
||||||
|
|
||||||
|
#ifdef XLOG
|
||||||
|
#error "XLOG has been defined already"
|
||||||
|
#endif // XLOG
|
||||||
|
#ifdef XCHECK
|
||||||
|
#error "XCHECK has been defined already"
|
||||||
|
#endif // XCHECK
|
||||||
|
|
||||||
|
#define XLOG(level) limonp::Logger(limonp::LL_##level, __FILE__, __LINE__).Stream()
|
||||||
|
#define XCHECK(exp) if(!(exp)) XLOG(FATAL) << "exp: ["#exp << "] false. "
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
|
||||||
|
enum {
|
||||||
|
LL_DEBUG = 0,
|
||||||
|
LL_INFO = 1,
|
||||||
|
LL_WARNING = 2,
|
||||||
|
LL_ERROR = 3,
|
||||||
|
LL_FATAL = 4,
|
||||||
|
}; // enum
|
||||||
|
|
||||||
|
static const char * LOG_LEVEL_ARRAY[] = {"DEBUG","INFO","WARN","ERROR","FATAL"};
|
||||||
|
|
||||||
|
class Logger {
|
||||||
|
public:
|
||||||
|
Logger(size_t level, const char* filename, int lineno)
|
||||||
|
: level_(level) {
|
||||||
|
#ifdef LOGGING_LEVEL
|
||||||
|
if (level_ < LOGGING_LEVEL) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
assert(level_ <= sizeof(LOG_LEVEL_ARRAY)/sizeof(*LOG_LEVEL_ARRAY));
|
||||||
|
char buf[32];
|
||||||
|
time_t now;
|
||||||
|
time(&now);
|
||||||
|
struct tm result;
|
||||||
|
localtime_r(&now, &result);
|
||||||
|
strftime(buf, sizeof(buf), "%Y-%m-%d %H:%M:%S", &result);
|
||||||
|
stream_ << buf
|
||||||
|
<< " " << filename
|
||||||
|
<< ":" << lineno
|
||||||
|
<< " " << LOG_LEVEL_ARRAY[level_]
|
||||||
|
<< " ";
|
||||||
|
}
|
||||||
|
~Logger() {
|
||||||
|
#ifdef LOGGING_LEVEL
|
||||||
|
if (level_ < LOGGING_LEVEL) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
std::cerr << stream_.str() << std::endl;
|
||||||
|
if (level_ == LL_FATAL) {
|
||||||
|
abort();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
std::ostream& Stream() {
|
||||||
|
return stream_;
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
std::ostringstream stream_;
|
||||||
|
size_t level_;
|
||||||
|
}; // class Logger
|
||||||
|
|
||||||
|
} // namespace limonp
|
||||||
|
|
||||||
|
#endif // LIMONP_LOGGING_HPP
|
|
@ -0,0 +1,415 @@
|
||||||
|
/***************************************************************************
|
||||||
|
*Copyright (C) 1991-2, RSA Data Security, Inc. Created 1991
|
||||||
|
* 2020, KylinSoft Co., Ltd.
|
||||||
|
*All rights reserved.
|
||||||
|
*
|
||||||
|
*License to copy and use this software is granted provided that it
|
||||||
|
*is identified as the "RSA Data Security, Inc. MD5 Message-Digest
|
||||||
|
*Algorithm" in all material mentioning or referencing this software
|
||||||
|
*or this function.
|
||||||
|
*
|
||||||
|
*License is also granted to make and use derivative works provided
|
||||||
|
*that such works are identified as "derived from the RSA Data
|
||||||
|
*Security, Inc. MD5 Message-Digest Algorithm" in all material
|
||||||
|
*mentioning or referencing the derived work.
|
||||||
|
*
|
||||||
|
*RSA Data Security, Inc. makes no representations concerning either
|
||||||
|
*the merchantability of this software or the suitability of this
|
||||||
|
*software for any particular purpose. It is provided "as is"
|
||||||
|
*without express or implied warranty of any kind.
|
||||||
|
*
|
||||||
|
*These notices must be retained in any copies of any part of this
|
||||||
|
*documentation and/or software.
|
||||||
|
*
|
||||||
|
*
|
||||||
|
*
|
||||||
|
*The original md5 implementation avoids external libraries.
|
||||||
|
*This version has dependency on stdio.h for file input and
|
||||||
|
*string.h for memcpy.
|
||||||
|
*
|
||||||
|
***************************************************************************/
|
||||||
|
|
||||||
|
#ifndef __MD5_H__
|
||||||
|
#define __MD5_H__
|
||||||
|
#include <cstdio>
|
||||||
|
#include <cstring>
|
||||||
|
#include <iostream>
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
|
||||||
|
//#pragma region MD5 defines
|
||||||
|
// Constants for MD5Transform routine.
|
||||||
|
#define S11 7
|
||||||
|
#define S12 12
|
||||||
|
#define S13 17
|
||||||
|
#define S14 22
|
||||||
|
#define S21 5
|
||||||
|
#define S22 9
|
||||||
|
#define S23 14
|
||||||
|
#define S24 20
|
||||||
|
#define S31 4
|
||||||
|
#define S32 11
|
||||||
|
#define S33 16
|
||||||
|
#define S34 23
|
||||||
|
#define S41 6
|
||||||
|
#define S42 10
|
||||||
|
#define S43 15
|
||||||
|
#define S44 21
|
||||||
|
|
||||||
|
|
||||||
|
// F, G, H and I are basic MD5 functions.
|
||||||
|
#define F(x, y, z) (((x) & (y)) | ((~x) & (z)))
|
||||||
|
#define G(x, y, z) (((x) & (z)) | ((y) & (~z)))
|
||||||
|
#define H(x, y, z) ((x) ^ (y) ^ (z))
|
||||||
|
#define I(x, y, z) ((y) ^ ((x) | (~z)))
|
||||||
|
|
||||||
|
// ROTATE_LEFT rotates x left n bits.
|
||||||
|
#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32-(n))))
|
||||||
|
|
||||||
|
// FF, GG, HH, and II transformations for rounds 1, 2, 3, and 4.
|
||||||
|
// Rotation is separate from addition to prevent recomputation.
|
||||||
|
#define FF(a, b, c, d, x, s, ac) { \
|
||||||
|
(a) += F ((b), (c), (d)) + (x) + (UINT4)(ac); \
|
||||||
|
(a) = ROTATE_LEFT ((a), (s)); \
|
||||||
|
(a) += (b); \
|
||||||
|
}
|
||||||
|
#define GG(a, b, c, d, x, s, ac) { \
|
||||||
|
(a) += G ((b), (c), (d)) + (x) + (UINT4)(ac); \
|
||||||
|
(a) = ROTATE_LEFT ((a), (s)); \
|
||||||
|
(a) += (b); \
|
||||||
|
}
|
||||||
|
#define HH(a, b, c, d, x, s, ac) { \
|
||||||
|
(a) += H ((b), (c), (d)) + (x) + (UINT4)(ac); \
|
||||||
|
(a) = ROTATE_LEFT ((a), (s)); \
|
||||||
|
(a) += (b); \
|
||||||
|
}
|
||||||
|
#define II(a, b, c, d, x, s, ac) { \
|
||||||
|
(a) += I ((b), (c), (d)) + (x) + (UINT4)(ac); \
|
||||||
|
(a) = ROTATE_LEFT ((a), (s)); \
|
||||||
|
(a) += (b); \
|
||||||
|
}
|
||||||
|
//#pragma endregion
|
||||||
|
|
||||||
|
|
||||||
|
typedef unsigned char BYTE ;
|
||||||
|
|
||||||
|
// POINTER defines a generic pointer type
|
||||||
|
typedef unsigned char *POINTER;
|
||||||
|
|
||||||
|
// UINT2 defines a two byte word
|
||||||
|
typedef unsigned short int UINT2;
|
||||||
|
|
||||||
|
// UINT4 defines a four byte word
|
||||||
|
typedef unsigned int UINT4;
|
||||||
|
|
||||||
|
static unsigned char PADDING[64] = {
|
||||||
|
0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||||
|
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
|
||||||
|
};
|
||||||
|
// convenient object that wraps
|
||||||
|
// the C-functions for use in C++ only
|
||||||
|
class MD5 {
|
||||||
|
private:
|
||||||
|
struct __context_t {
|
||||||
|
UINT4 state[4]; /* state (ABCD) */
|
||||||
|
UINT4 count[2]; /* number of bits, modulo 2^64 (lsb first) */
|
||||||
|
unsigned char buffer[64]; /* input buffer */
|
||||||
|
} context ;
|
||||||
|
|
||||||
|
//#pragma region static helper functions
|
||||||
|
// The core of the MD5 algorithm is here.
|
||||||
|
// MD5 basic transformation. Transforms state based on block.
|
||||||
|
static void MD5Transform(UINT4 state[4], unsigned char block[64]) {
|
||||||
|
UINT4 a = state[0], b = state[1], c = state[2], d = state[3], x[16];
|
||||||
|
|
||||||
|
Decode(x, block, 64);
|
||||||
|
|
||||||
|
/* Round 1 */
|
||||||
|
FF(a, b, c, d, x[ 0], S11, 0xd76aa478); /* 1 */
|
||||||
|
FF(d, a, b, c, x[ 1], S12, 0xe8c7b756); /* 2 */
|
||||||
|
FF(c, d, a, b, x[ 2], S13, 0x242070db); /* 3 */
|
||||||
|
FF(b, c, d, a, x[ 3], S14, 0xc1bdceee); /* 4 */
|
||||||
|
FF(a, b, c, d, x[ 4], S11, 0xf57c0faf); /* 5 */
|
||||||
|
FF(d, a, b, c, x[ 5], S12, 0x4787c62a); /* 6 */
|
||||||
|
FF(c, d, a, b, x[ 6], S13, 0xa8304613); /* 7 */
|
||||||
|
FF(b, c, d, a, x[ 7], S14, 0xfd469501); /* 8 */
|
||||||
|
FF(a, b, c, d, x[ 8], S11, 0x698098d8); /* 9 */
|
||||||
|
FF(d, a, b, c, x[ 9], S12, 0x8b44f7af); /* 10 */
|
||||||
|
FF(c, d, a, b, x[10], S13, 0xffff5bb1); /* 11 */
|
||||||
|
FF(b, c, d, a, x[11], S14, 0x895cd7be); /* 12 */
|
||||||
|
FF(a, b, c, d, x[12], S11, 0x6b901122); /* 13 */
|
||||||
|
FF(d, a, b, c, x[13], S12, 0xfd987193); /* 14 */
|
||||||
|
FF(c, d, a, b, x[14], S13, 0xa679438e); /* 15 */
|
||||||
|
FF(b, c, d, a, x[15], S14, 0x49b40821); /* 16 */
|
||||||
|
|
||||||
|
/* Round 2 */
|
||||||
|
GG(a, b, c, d, x[ 1], S21, 0xf61e2562); /* 17 */
|
||||||
|
GG(d, a, b, c, x[ 6], S22, 0xc040b340); /* 18 */
|
||||||
|
GG(c, d, a, b, x[11], S23, 0x265e5a51); /* 19 */
|
||||||
|
GG(b, c, d, a, x[ 0], S24, 0xe9b6c7aa); /* 20 */
|
||||||
|
GG(a, b, c, d, x[ 5], S21, 0xd62f105d); /* 21 */
|
||||||
|
GG(d, a, b, c, x[10], S22, 0x2441453); /* 22 */
|
||||||
|
GG(c, d, a, b, x[15], S23, 0xd8a1e681); /* 23 */
|
||||||
|
GG(b, c, d, a, x[ 4], S24, 0xe7d3fbc8); /* 24 */
|
||||||
|
GG(a, b, c, d, x[ 9], S21, 0x21e1cde6); /* 25 */
|
||||||
|
GG(d, a, b, c, x[14], S22, 0xc33707d6); /* 26 */
|
||||||
|
GG(c, d, a, b, x[ 3], S23, 0xf4d50d87); /* 27 */
|
||||||
|
GG(b, c, d, a, x[ 8], S24, 0x455a14ed); /* 28 */
|
||||||
|
GG(a, b, c, d, x[13], S21, 0xa9e3e905); /* 29 */
|
||||||
|
GG(d, a, b, c, x[ 2], S22, 0xfcefa3f8); /* 30 */
|
||||||
|
GG(c, d, a, b, x[ 7], S23, 0x676f02d9); /* 31 */
|
||||||
|
GG(b, c, d, a, x[12], S24, 0x8d2a4c8a); /* 32 */
|
||||||
|
|
||||||
|
/* Round 3 */
|
||||||
|
HH(a, b, c, d, x[ 5], S31, 0xfffa3942); /* 33 */
|
||||||
|
HH(d, a, b, c, x[ 8], S32, 0x8771f681); /* 34 */
|
||||||
|
HH(c, d, a, b, x[11], S33, 0x6d9d6122); /* 35 */
|
||||||
|
HH(b, c, d, a, x[14], S34, 0xfde5380c); /* 36 */
|
||||||
|
HH(a, b, c, d, x[ 1], S31, 0xa4beea44); /* 37 */
|
||||||
|
HH(d, a, b, c, x[ 4], S32, 0x4bdecfa9); /* 38 */
|
||||||
|
HH(c, d, a, b, x[ 7], S33, 0xf6bb4b60); /* 39 */
|
||||||
|
HH(b, c, d, a, x[10], S34, 0xbebfbc70); /* 40 */
|
||||||
|
HH(a, b, c, d, x[13], S31, 0x289b7ec6); /* 41 */
|
||||||
|
HH(d, a, b, c, x[ 0], S32, 0xeaa127fa); /* 42 */
|
||||||
|
HH(c, d, a, b, x[ 3], S33, 0xd4ef3085); /* 43 */
|
||||||
|
HH(b, c, d, a, x[ 6], S34, 0x4881d05); /* 44 */
|
||||||
|
HH(a, b, c, d, x[ 9], S31, 0xd9d4d039); /* 45 */
|
||||||
|
HH(d, a, b, c, x[12], S32, 0xe6db99e5); /* 46 */
|
||||||
|
HH(c, d, a, b, x[15], S33, 0x1fa27cf8); /* 47 */
|
||||||
|
HH(b, c, d, a, x[ 2], S34, 0xc4ac5665); /* 48 */
|
||||||
|
|
||||||
|
/* Round 4 */
|
||||||
|
II(a, b, c, d, x[ 0], S41, 0xf4292244); /* 49 */
|
||||||
|
II(d, a, b, c, x[ 7], S42, 0x432aff97); /* 50 */
|
||||||
|
II(c, d, a, b, x[14], S43, 0xab9423a7); /* 51 */
|
||||||
|
II(b, c, d, a, x[ 5], S44, 0xfc93a039); /* 52 */
|
||||||
|
II(a, b, c, d, x[12], S41, 0x655b59c3); /* 53 */
|
||||||
|
II(d, a, b, c, x[ 3], S42, 0x8f0ccc92); /* 54 */
|
||||||
|
II(c, d, a, b, x[10], S43, 0xffeff47d); /* 55 */
|
||||||
|
II(b, c, d, a, x[ 1], S44, 0x85845dd1); /* 56 */
|
||||||
|
II(a, b, c, d, x[ 8], S41, 0x6fa87e4f); /* 57 */
|
||||||
|
II(d, a, b, c, x[15], S42, 0xfe2ce6e0); /* 58 */
|
||||||
|
II(c, d, a, b, x[ 6], S43, 0xa3014314); /* 59 */
|
||||||
|
II(b, c, d, a, x[13], S44, 0x4e0811a1); /* 60 */
|
||||||
|
II(a, b, c, d, x[ 4], S41, 0xf7537e82); /* 61 */
|
||||||
|
II(d, a, b, c, x[11], S42, 0xbd3af235); /* 62 */
|
||||||
|
II(c, d, a, b, x[ 2], S43, 0x2ad7d2bb); /* 63 */
|
||||||
|
II(b, c, d, a, x[ 9], S44, 0xeb86d391); /* 64 */
|
||||||
|
|
||||||
|
state[0] += a;
|
||||||
|
state[1] += b;
|
||||||
|
state[2] += c;
|
||||||
|
state[3] += d;
|
||||||
|
|
||||||
|
// Zeroize sensitive information.
|
||||||
|
memset((POINTER)x, 0, sizeof(x));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Encodes input (UINT4) into output (unsigned char). Assumes len is
|
||||||
|
// a multiple of 4.
|
||||||
|
static void Encode(unsigned char *output, UINT4 *input, unsigned int len) {
|
||||||
|
unsigned int i, j;
|
||||||
|
|
||||||
|
for(i = 0, j = 0; j < len; i++, j += 4) {
|
||||||
|
output[j] = (unsigned char)(input[i] & 0xff);
|
||||||
|
output[j + 1] = (unsigned char)((input[i] >> 8) & 0xff);
|
||||||
|
output[j + 2] = (unsigned char)((input[i] >> 16) & 0xff);
|
||||||
|
output[j + 3] = (unsigned char)((input[i] >> 24) & 0xff);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Decodes input (unsigned char) into output (UINT4). Assumes len is
|
||||||
|
// a multiple of 4.
|
||||||
|
static void Decode(UINT4 *output, unsigned char *input, unsigned int len) {
|
||||||
|
unsigned int i, j;
|
||||||
|
|
||||||
|
for(i = 0, j = 0; j < len; i++, j += 4)
|
||||||
|
output[i] = ((UINT4)input[j]) | (((UINT4)input[j + 1]) << 8) |
|
||||||
|
(((UINT4)input[j + 2]) << 16) | (((UINT4)input[j + 3]) << 24);
|
||||||
|
}
|
||||||
|
//#pragma endregion
|
||||||
|
|
||||||
|
|
||||||
|
public:
|
||||||
|
// MAIN FUNCTIONS
|
||||||
|
MD5() {
|
||||||
|
Init() ;
|
||||||
|
}
|
||||||
|
|
||||||
|
// MD5 initialization. Begins an MD5 operation, writing a new context.
|
||||||
|
void Init() {
|
||||||
|
context.count[0] = context.count[1] = 0;
|
||||||
|
|
||||||
|
// Load magic initialization constants.
|
||||||
|
context.state[0] = 0x67452301;
|
||||||
|
context.state[1] = 0xefcdab89;
|
||||||
|
context.state[2] = 0x98badcfe;
|
||||||
|
context.state[3] = 0x10325476;
|
||||||
|
}
|
||||||
|
|
||||||
|
// MD5 block update operation. Continues an MD5 message-digest
|
||||||
|
// operation, processing another message block, and updating the
|
||||||
|
// context.
|
||||||
|
void Update(
|
||||||
|
unsigned char *input, // input block
|
||||||
|
unsigned int inputLen) { // length of input block
|
||||||
|
unsigned int i, index, partLen;
|
||||||
|
|
||||||
|
// Compute number of bytes mod 64
|
||||||
|
index = (unsigned int)((context.count[0] >> 3) & 0x3F);
|
||||||
|
|
||||||
|
// Update number of bits
|
||||||
|
if((context.count[0] += ((UINT4)inputLen << 3))
|
||||||
|
< ((UINT4)inputLen << 3))
|
||||||
|
context.count[1]++;
|
||||||
|
context.count[1] += ((UINT4)inputLen >> 29);
|
||||||
|
|
||||||
|
partLen = 64 - index;
|
||||||
|
|
||||||
|
// Transform as many times as possible.
|
||||||
|
if(inputLen >= partLen) {
|
||||||
|
memcpy((POINTER)&context.buffer[index], (POINTER)input, partLen);
|
||||||
|
MD5Transform(context.state, context.buffer);
|
||||||
|
|
||||||
|
for(i = partLen; i + 63 < inputLen; i += 64)
|
||||||
|
MD5Transform(context.state, &input[i]);
|
||||||
|
|
||||||
|
index = 0;
|
||||||
|
} else
|
||||||
|
i = 0;
|
||||||
|
|
||||||
|
/* Buffer remaining input */
|
||||||
|
memcpy((POINTER)&context.buffer[index], (POINTER)&input[i], inputLen - i);
|
||||||
|
}
|
||||||
|
|
||||||
|
// MD5 finalization. Ends an MD5 message-digest operation, writing the
|
||||||
|
// the message digest and zeroizing the context.
|
||||||
|
// Writes to digestRaw
|
||||||
|
void Final() {
|
||||||
|
unsigned char bits[8];
|
||||||
|
unsigned int index, padLen;
|
||||||
|
|
||||||
|
// Save number of bits
|
||||||
|
Encode(bits, context.count, 8);
|
||||||
|
|
||||||
|
// Pad out to 56 mod 64.
|
||||||
|
index = (unsigned int)((context.count[0] >> 3) & 0x3f);
|
||||||
|
padLen = (index < 56) ? (56 - index) : (120 - index);
|
||||||
|
Update(PADDING, padLen);
|
||||||
|
|
||||||
|
// Append length (before padding)
|
||||||
|
Update(bits, 8);
|
||||||
|
|
||||||
|
// Store state in digest
|
||||||
|
Encode(digestRaw, context.state, 16);
|
||||||
|
|
||||||
|
// Zeroize sensitive information.
|
||||||
|
memset((POINTER)&context, 0, sizeof(context));
|
||||||
|
|
||||||
|
writeToString() ;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Buffer must be 32+1 (nul) = 33 chars long at least
|
||||||
|
void writeToString() {
|
||||||
|
int pos ;
|
||||||
|
|
||||||
|
for(pos = 0 ; pos < 16 ; pos++)
|
||||||
|
sprintf(digestChars + (pos * 2), "%02x", digestRaw[pos]) ;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
public:
|
||||||
|
// an MD5 digest is a 16-byte number (32 hex digits)
|
||||||
|
BYTE digestRaw[ 16 ] ;
|
||||||
|
|
||||||
|
// This version of the digest is actually
|
||||||
|
// a "printf'd" version of the digest.
|
||||||
|
char digestChars[ 33 ] ;
|
||||||
|
|
||||||
|
/// Load a file from disk and digest it
|
||||||
|
// Digests a file and returns the result.
|
||||||
|
const char* digestFile(const char *filename) {
|
||||||
|
if(NULL == filename || strcmp(filename, "") == 0)
|
||||||
|
return NULL;
|
||||||
|
|
||||||
|
Init() ;
|
||||||
|
|
||||||
|
FILE *file;
|
||||||
|
|
||||||
|
unsigned char buffer[1024] ;
|
||||||
|
|
||||||
|
if((file = fopen(filename, "rb")) == NULL) {
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
int len;
|
||||||
|
while((len = fread(buffer, 1, 1024, file)))
|
||||||
|
Update(buffer, len) ;
|
||||||
|
Final();
|
||||||
|
|
||||||
|
fclose(file);
|
||||||
|
|
||||||
|
return digestChars ;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Digests a byte-array already in memory
|
||||||
|
const char* digestMemory(BYTE *memchunk, int len) {
|
||||||
|
if(NULL == memchunk)
|
||||||
|
return NULL;
|
||||||
|
|
||||||
|
Init() ;
|
||||||
|
Update(memchunk, len) ;
|
||||||
|
Final() ;
|
||||||
|
|
||||||
|
return digestChars ;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Digests a string and prints the result.
|
||||||
|
const char* digestString(const char *string) {
|
||||||
|
if(string == NULL)
|
||||||
|
return NULL;
|
||||||
|
|
||||||
|
Init() ;
|
||||||
|
Update((unsigned char*)string, strlen(string)) ;
|
||||||
|
Final() ;
|
||||||
|
|
||||||
|
return digestChars ;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
inline bool md5String(const char* str, std::string& res) {
|
||||||
|
if(NULL == str) {
|
||||||
|
res = "";
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
MD5 md5;
|
||||||
|
const char *pRes = md5.digestString(str);
|
||||||
|
if(NULL == pRes) {
|
||||||
|
res = "";
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
res = pRes;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline bool md5File(const char* filepath, std::string& res) {
|
||||||
|
if(NULL == filepath || strcmp(filepath, "") == 0) {
|
||||||
|
res = "";
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
MD5 md5;
|
||||||
|
const char *pRes = md5.digestFile(filepath);
|
||||||
|
|
||||||
|
if(NULL == pRes) {
|
||||||
|
res = "";
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
res = pRes;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#endif
|
|
@ -0,0 +1,51 @@
|
||||||
|
#ifndef LIMONP_MUTEX_LOCK_HPP
|
||||||
|
#define LIMONP_MUTEX_LOCK_HPP
|
||||||
|
|
||||||
|
#include <pthread.h>
|
||||||
|
#include "NonCopyable.hpp"
|
||||||
|
#include "Logging.hpp"
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
|
||||||
|
class MutexLock: NonCopyable {
|
||||||
|
public:
|
||||||
|
MutexLock() {
|
||||||
|
XCHECK(!pthread_mutex_init(&mutex_, NULL));
|
||||||
|
}
|
||||||
|
~MutexLock() {
|
||||||
|
XCHECK(!pthread_mutex_destroy(&mutex_));
|
||||||
|
}
|
||||||
|
pthread_mutex_t* GetPthreadMutex() {
|
||||||
|
return &mutex_;
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
void Lock() {
|
||||||
|
XCHECK(!pthread_mutex_lock(&mutex_));
|
||||||
|
}
|
||||||
|
void Unlock() {
|
||||||
|
XCHECK(!pthread_mutex_unlock(&mutex_));
|
||||||
|
}
|
||||||
|
friend class MutexLockGuard;
|
||||||
|
|
||||||
|
pthread_mutex_t mutex_;
|
||||||
|
}; // class MutexLock
|
||||||
|
|
||||||
|
class MutexLockGuard: NonCopyable {
|
||||||
|
public:
|
||||||
|
explicit MutexLockGuard(MutexLock & mutex)
|
||||||
|
: mutex_(mutex) {
|
||||||
|
mutex_.Lock();
|
||||||
|
}
|
||||||
|
~MutexLockGuard() {
|
||||||
|
mutex_.Unlock();
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
MutexLock & mutex_;
|
||||||
|
}; // class MutexLockGuard
|
||||||
|
|
||||||
|
#define MutexLockGuard(x) XCHECK(false);
|
||||||
|
|
||||||
|
} // namespace limonp
|
||||||
|
|
||||||
|
#endif // LIMONP_MUTEX_LOCK_HPP
|
|
@ -0,0 +1,21 @@
|
||||||
|
/************************************
|
||||||
|
************************************/
|
||||||
|
#ifndef LIMONP_NONCOPYABLE_H
|
||||||
|
#define LIMONP_NONCOPYABLE_H
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
|
||||||
|
class NonCopyable {
|
||||||
|
protected:
|
||||||
|
NonCopyable() {
|
||||||
|
}
|
||||||
|
~NonCopyable() {
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
NonCopyable(const NonCopyable& );
|
||||||
|
const NonCopyable& operator=(const NonCopyable& );
|
||||||
|
}; // class NonCopyable
|
||||||
|
|
||||||
|
} // namespace limonp
|
||||||
|
|
||||||
|
#endif // LIMONP_NONCOPYABLE_H
|
|
@ -0,0 +1,157 @@
|
||||||
|
#ifndef LIMONP_STD_EXTEMSION_HPP
|
||||||
|
#define LIMONP_STD_EXTEMSION_HPP
|
||||||
|
|
||||||
|
#include <map>
|
||||||
|
|
||||||
|
#ifdef __APPLE__
|
||||||
|
#include <unordered_map>
|
||||||
|
#include <unordered_set>
|
||||||
|
#elif(__cplusplus >= 201103L)
|
||||||
|
#include <unordered_map>
|
||||||
|
#include <unordered_set>
|
||||||
|
#elif defined _MSC_VER
|
||||||
|
#include <unordered_map>
|
||||||
|
#include <unordered_set>
|
||||||
|
#else
|
||||||
|
#include <tr1/unordered_map>
|
||||||
|
#include <tr1/unordered_set>
|
||||||
|
namespace std {
|
||||||
|
using std::tr1::unordered_map;
|
||||||
|
using std::tr1::unordered_set;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#include <set>
|
||||||
|
#include <string>
|
||||||
|
#include <vector>
|
||||||
|
#include <deque>
|
||||||
|
#include <fstream>
|
||||||
|
#include <sstream>
|
||||||
|
|
||||||
|
namespace std {
|
||||||
|
|
||||||
|
template<typename T>
|
||||||
|
ostream& operator << (ostream& os, const vector<T>& v) {
|
||||||
|
if(v.empty()) {
|
||||||
|
return os << "[]";
|
||||||
|
}
|
||||||
|
os<<"["<<v[0];
|
||||||
|
for(size_t i = 1; i < v.size(); i++) {
|
||||||
|
os<<", "<<v[i];
|
||||||
|
}
|
||||||
|
os<<"]";
|
||||||
|
return os;
|
||||||
|
}
|
||||||
|
|
||||||
|
template<>
|
||||||
|
inline ostream& operator << (ostream& os, const vector<string>& v) {
|
||||||
|
if(v.empty()) {
|
||||||
|
return os << "[]";
|
||||||
|
}
|
||||||
|
os<<"[\""<<v[0];
|
||||||
|
for(size_t i = 1; i < v.size(); i++) {
|
||||||
|
os<<"\", \""<<v[i];
|
||||||
|
}
|
||||||
|
os<<"\"]";
|
||||||
|
return os;
|
||||||
|
}
|
||||||
|
|
||||||
|
template<typename T>
|
||||||
|
ostream& operator << (ostream& os, const deque<T>& dq) {
|
||||||
|
if(dq.empty()) {
|
||||||
|
return os << "[]";
|
||||||
|
}
|
||||||
|
os<<"[\""<<dq[0];
|
||||||
|
for(size_t i = 1; i < dq.size(); i++) {
|
||||||
|
os<<"\", \""<<dq[i];
|
||||||
|
}
|
||||||
|
os<<"\"]";
|
||||||
|
return os;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
template<class T1, class T2>
|
||||||
|
ostream& operator << (ostream& os, const pair<T1, T2>& pr) {
|
||||||
|
os << pr.first << ":" << pr.second ;
|
||||||
|
return os;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
template<class T>
|
||||||
|
string& operator << (string& str, const T& obj) {
|
||||||
|
stringstream ss;
|
||||||
|
ss << obj; // call ostream& operator << (ostream& os,
|
||||||
|
return str = ss.str();
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class T1, class T2>
|
||||||
|
ostream& operator << (ostream& os, const map<T1, T2>& mp) {
|
||||||
|
if(mp.empty()) {
|
||||||
|
os<<"{}";
|
||||||
|
return os;
|
||||||
|
}
|
||||||
|
os<<'{';
|
||||||
|
typename map<T1, T2>::const_iterator it = mp.begin();
|
||||||
|
os<<*it;
|
||||||
|
it++;
|
||||||
|
while(it != mp.end()) {
|
||||||
|
os<<", "<<*it;
|
||||||
|
it++;
|
||||||
|
}
|
||||||
|
os<<'}';
|
||||||
|
return os;
|
||||||
|
}
|
||||||
|
template<class T1, class T2>
|
||||||
|
ostream& operator << (ostream& os, const std::unordered_map<T1, T2>& mp) {
|
||||||
|
if(mp.empty()) {
|
||||||
|
return os << "{}";
|
||||||
|
}
|
||||||
|
os<<'{';
|
||||||
|
typename std::unordered_map<T1, T2>::const_iterator it = mp.begin();
|
||||||
|
os<<*it;
|
||||||
|
it++;
|
||||||
|
while(it != mp.end()) {
|
||||||
|
os<<", "<<*it++;
|
||||||
|
}
|
||||||
|
return os<<'}';
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class T>
|
||||||
|
ostream& operator << (ostream& os, const set<T>& st) {
|
||||||
|
if(st.empty()) {
|
||||||
|
os << "{}";
|
||||||
|
return os;
|
||||||
|
}
|
||||||
|
os<<'{';
|
||||||
|
typename set<T>::const_iterator it = st.begin();
|
||||||
|
os<<*it;
|
||||||
|
it++;
|
||||||
|
while(it != st.end()) {
|
||||||
|
os<<", "<<*it;
|
||||||
|
it++;
|
||||||
|
}
|
||||||
|
os<<'}';
|
||||||
|
return os;
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class KeyType, class ContainType>
|
||||||
|
bool IsIn(const ContainType& contain, const KeyType& key) {
|
||||||
|
return contain.end() != contain.find(key);
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class T>
|
||||||
|
basic_string<T> & operator << (basic_string<T> & s, ifstream & ifs) {
|
||||||
|
return s.assign((istreambuf_iterator<T>(ifs)), istreambuf_iterator<T>());
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class T>
|
||||||
|
ofstream & operator << (ofstream & ofs, const basic_string<T>& s) {
|
||||||
|
ostreambuf_iterator<T> itr (ofs);
|
||||||
|
copy(s.begin(), s.end(), itr);
|
||||||
|
return ofs;
|
||||||
|
}
|
||||||
|
|
||||||
|
} // namespace std
|
||||||
|
|
||||||
|
#endif
|
|
@ -0,0 +1,382 @@
|
||||||
|
/************************************
|
||||||
|
* file enc : ascii
|
||||||
|
* author : wuyanyi09@gmail.com
|
||||||
|
************************************/
|
||||||
|
#ifndef LIMONP_STR_FUNCTS_H
|
||||||
|
#define LIMONP_STR_FUNCTS_H
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdarg.h>
|
||||||
|
#include <memory.h>
|
||||||
|
#include <sys/types.h>
|
||||||
|
#include <fstream>
|
||||||
|
#include <iostream>
|
||||||
|
#include <string>
|
||||||
|
#include <vector>
|
||||||
|
#include <algorithm>
|
||||||
|
#include <cctype>
|
||||||
|
#include <map>
|
||||||
|
#include <functional>
|
||||||
|
#include <locale>
|
||||||
|
#include <sstream>
|
||||||
|
#include <iterator>
|
||||||
|
#include <algorithm>
|
||||||
|
#include "StdExtension.hpp"
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
using namespace std;
|
||||||
|
inline string StringFormat(const char* fmt, ...) {
|
||||||
|
int size = 256;
|
||||||
|
std::string str;
|
||||||
|
va_list ap;
|
||||||
|
while (1) {
|
||||||
|
str.resize(size);
|
||||||
|
va_start(ap, fmt);
|
||||||
|
int n = vsnprintf((char *)str.c_str(), size, fmt, ap);
|
||||||
|
va_end(ap);
|
||||||
|
if (n > -1 && n < size) {
|
||||||
|
str.resize(n);
|
||||||
|
return str;
|
||||||
|
}
|
||||||
|
if (n > -1)
|
||||||
|
size = n + 1;
|
||||||
|
else
|
||||||
|
size *= 2;
|
||||||
|
}
|
||||||
|
return str;
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class T>
|
||||||
|
void Join(T begin, T end, string& res, const string& connector) {
|
||||||
|
if(begin == end) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
stringstream ss;
|
||||||
|
ss<<*begin;
|
||||||
|
begin++;
|
||||||
|
while(begin != end) {
|
||||||
|
ss << connector << *begin;
|
||||||
|
begin ++;
|
||||||
|
}
|
||||||
|
res = ss.str();
|
||||||
|
}
|
||||||
|
|
||||||
|
template<class T>
|
||||||
|
string Join(T begin, T end, const string& connector) {
|
||||||
|
string res;
|
||||||
|
Join(begin ,end, res, connector);
|
||||||
|
return res;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline string& Upper(string& str) {
|
||||||
|
transform(str.begin(), str.end(), str.begin(), (int (*)(int))toupper);
|
||||||
|
return str;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline string& Lower(string& str) {
|
||||||
|
transform(str.begin(), str.end(), str.begin(), (int (*)(int))tolower);
|
||||||
|
return str;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline bool IsSpace(unsigned c) {
|
||||||
|
// when passing large int as the argument of isspace, it core dump, so here need a type cast.
|
||||||
|
return c > 0xff ? false : std::isspace(c & 0xff);
|
||||||
|
}
|
||||||
|
|
||||||
|
inline std::string& LTrim(std::string &s) {
|
||||||
|
s.erase(s.begin(), std::find_if(s.begin(), s.end(), std::not1(std::ptr_fun<unsigned, bool>(IsSpace))));
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline std::string& RTrim(std::string &s) {
|
||||||
|
s.erase(std::find_if(s.rbegin(), s.rend(), std::not1(std::ptr_fun<unsigned, bool>(IsSpace))).base(), s.end());
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline std::string& Trim(std::string &s) {
|
||||||
|
return LTrim(RTrim(s));
|
||||||
|
}
|
||||||
|
|
||||||
|
inline std::string& LTrim(std::string & s, char x) {
|
||||||
|
s.erase(s.begin(), std::find_if(s.begin(), s.end(), std::not1(std::bind2nd(std::equal_to<char>(), x))));
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline std::string& RTrim(std::string & s, char x) {
|
||||||
|
s.erase(std::find_if(s.rbegin(), s.rend(), std::not1(std::bind2nd(std::equal_to<char>(), x))).base(), s.end());
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline std::string& Trim(std::string &s, char x) {
|
||||||
|
return LTrim(RTrim(s, x), x);
|
||||||
|
}
|
||||||
|
|
||||||
|
inline void Split(const string& src, vector<string>& res, const string& pattern, size_t maxsplit = string::npos) {
|
||||||
|
res.clear();
|
||||||
|
size_t Start = 0;
|
||||||
|
size_t end = 0;
|
||||||
|
string sub;
|
||||||
|
while(Start < src.size()) {
|
||||||
|
end = src.find_first_of(pattern, Start);
|
||||||
|
if(string::npos == end || res.size() >= maxsplit) {
|
||||||
|
sub = src.substr(Start);
|
||||||
|
res.push_back(sub);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
sub = src.substr(Start, end - Start);
|
||||||
|
res.push_back(sub);
|
||||||
|
Start = end + 1;
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline vector<string> Split(const string& src, const string& pattern, size_t maxsplit = string::npos) {
|
||||||
|
vector<string> res;
|
||||||
|
Split(src, res, pattern, maxsplit);
|
||||||
|
return res;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline bool StartsWith(const string& str, const string& prefix) {
|
||||||
|
if(prefix.length() > str.length()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return 0 == str.compare(0, prefix.length(), prefix);
|
||||||
|
}
|
||||||
|
|
||||||
|
inline bool EndsWith(const string& str, const string& suffix) {
|
||||||
|
if(suffix.length() > str.length()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return 0 == str.compare(str.length() - suffix.length(), suffix.length(), suffix);
|
||||||
|
}
|
||||||
|
|
||||||
|
inline bool IsInStr(const string& str, char ch) {
|
||||||
|
return str.find(ch) != string::npos;
|
||||||
|
}
|
||||||
|
|
||||||
|
inline uint16_t TwocharToUint16(char high, char low) {
|
||||||
|
return (((uint16_t(high) & 0x00ff ) << 8) | (uint16_t(low) & 0x00ff));
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class Uint16Container>
|
||||||
|
bool Utf8ToUnicode(const char * const str, size_t len, Uint16Container& vec) {
|
||||||
|
if(!str) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
char ch1, ch2;
|
||||||
|
uint16_t tmp;
|
||||||
|
vec.clear();
|
||||||
|
for(size_t i = 0; i < len;) {
|
||||||
|
if(!(str[i] & 0x80)) { // 0xxxxxxx
|
||||||
|
vec.push_back(str[i]);
|
||||||
|
i++;
|
||||||
|
} else if ((uint8_t)str[i] <= 0xdf && i + 1 < len) { // 110xxxxxx
|
||||||
|
ch1 = (str[i] >> 2) & 0x07;
|
||||||
|
ch2 = (str[i+1] & 0x3f) | ((str[i] & 0x03) << 6 );
|
||||||
|
tmp = (((uint16_t(ch1) & 0x00ff ) << 8) | (uint16_t(ch2) & 0x00ff));
|
||||||
|
vec.push_back(tmp);
|
||||||
|
i += 2;
|
||||||
|
} else if((uint8_t)str[i] <= 0xef && i + 2 < len) {
|
||||||
|
ch1 = ((uint8_t)str[i] << 4) | ((str[i+1] >> 2) & 0x0f );
|
||||||
|
ch2 = (((uint8_t)str[i+1]<<6) & 0xc0) | (str[i+2] & 0x3f);
|
||||||
|
tmp = (((uint16_t(ch1) & 0x00ff ) << 8) | (uint16_t(ch2) & 0x00ff));
|
||||||
|
vec.push_back(tmp);
|
||||||
|
i += 3;
|
||||||
|
} else {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class Uint16Container>
|
||||||
|
bool Utf8ToUnicode(const string& str, Uint16Container& vec) {
|
||||||
|
return Utf8ToUnicode(str.c_str(), str.size(), vec);
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class Uint32Container>
|
||||||
|
bool Utf8ToUnicode32(const char * str, size_t size, Uint32Container& vec) {
|
||||||
|
uint32_t tmp;
|
||||||
|
vec.clear();
|
||||||
|
for(size_t i = 0; i < size;) {
|
||||||
|
if(!(str[i] & 0x80)) { // 0xxxxxxx
|
||||||
|
// 7bit, total 7bit
|
||||||
|
tmp = (uint8_t)(str[i]) & 0x7f;
|
||||||
|
i++;
|
||||||
|
} else if ((uint8_t)str[i] <= 0xdf && i + 1 < size) { // 110xxxxxx
|
||||||
|
// 5bit, total 5bit
|
||||||
|
tmp = (uint8_t)(str[i]) & 0x1f;
|
||||||
|
|
||||||
|
// 6bit, total 11bit
|
||||||
|
tmp <<= 6;
|
||||||
|
tmp |= (uint8_t)(str[i+1]) & 0x3f;
|
||||||
|
i += 2;
|
||||||
|
} else if((uint8_t)str[i] <= 0xef && i + 2 < size) { // 1110xxxxxx
|
||||||
|
// 4bit, total 4bit
|
||||||
|
tmp = (uint8_t)(str[i]) & 0x0f;
|
||||||
|
|
||||||
|
// 6bit, total 10bit
|
||||||
|
tmp <<= 6;
|
||||||
|
tmp |= (uint8_t)(str[i+1]) & 0x3f;
|
||||||
|
|
||||||
|
// 6bit, total 16bit
|
||||||
|
tmp <<= 6;
|
||||||
|
tmp |= (uint8_t)(str[i+2]) & 0x3f;
|
||||||
|
|
||||||
|
i += 3;
|
||||||
|
} else if((uint8_t)str[i] <= 0xf7 && i + 3 < size) { // 11110xxxx
|
||||||
|
// 3bit, total 3bit
|
||||||
|
tmp = (uint8_t)(str[i]) & 0x07;
|
||||||
|
|
||||||
|
// 6bit, total 9bit
|
||||||
|
tmp <<= 6;
|
||||||
|
tmp |= (uint8_t)(str[i+1]) & 0x3f;
|
||||||
|
|
||||||
|
// 6bit, total 15bit
|
||||||
|
tmp <<= 6;
|
||||||
|
tmp |= (uint8_t)(str[i+2]) & 0x3f;
|
||||||
|
|
||||||
|
// 6bit, total 21bit
|
||||||
|
tmp <<= 6;
|
||||||
|
tmp |= (uint8_t)(str[i+3]) & 0x3f;
|
||||||
|
|
||||||
|
i += 4;
|
||||||
|
} else {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
vec.push_back(tmp);
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class Uint32Container>
|
||||||
|
bool Utf8ToUnicode32(const string& str, Uint32Container& vec) {
|
||||||
|
return Utf8ToUnicode32(str.data(), str.size(), vec);
|
||||||
|
}
|
||||||
|
|
||||||
|
inline int UnicodeToUtf8Bytes(uint32_t ui){
|
||||||
|
if(ui <= 0x7f) {
|
||||||
|
return 1;
|
||||||
|
} else if(ui <= 0x7ff) {
|
||||||
|
return 2;
|
||||||
|
} else if(ui <= 0xffff) {
|
||||||
|
return 3;
|
||||||
|
} else {
|
||||||
|
return 4;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class Uint32ContainerConIter>
|
||||||
|
void Unicode32ToUtf8(Uint32ContainerConIter begin, Uint32ContainerConIter end, string& res) {
|
||||||
|
res.clear();
|
||||||
|
uint32_t ui;
|
||||||
|
while(begin != end) {
|
||||||
|
ui = *begin;
|
||||||
|
if(ui <= 0x7f) {
|
||||||
|
res += char(ui);
|
||||||
|
} else if(ui <= 0x7ff) {
|
||||||
|
res += char(((ui >> 6) & 0x1f) | 0xc0);
|
||||||
|
res += char((ui & 0x3f) | 0x80);
|
||||||
|
} else if(ui <= 0xffff) {
|
||||||
|
res += char(((ui >> 12) & 0x0f) | 0xe0);
|
||||||
|
res += char(((ui >> 6) & 0x3f) | 0x80);
|
||||||
|
res += char((ui & 0x3f) | 0x80);
|
||||||
|
} else {
|
||||||
|
res += char(((ui >> 18) & 0x03) | 0xf0);
|
||||||
|
res += char(((ui >> 12) & 0x3f) | 0x80);
|
||||||
|
res += char(((ui >> 6) & 0x3f) | 0x80);
|
||||||
|
res += char((ui & 0x3f) | 0x80);
|
||||||
|
}
|
||||||
|
begin ++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class Uint16ContainerConIter>
|
||||||
|
void UnicodeToUtf8(Uint16ContainerConIter begin, Uint16ContainerConIter end, string& res) {
|
||||||
|
res.clear();
|
||||||
|
uint16_t ui;
|
||||||
|
while(begin != end) {
|
||||||
|
ui = *begin;
|
||||||
|
if(ui <= 0x7f) {
|
||||||
|
res += char(ui);
|
||||||
|
} else if(ui <= 0x7ff) {
|
||||||
|
res += char(((ui>>6) & 0x1f) | 0xc0);
|
||||||
|
res += char((ui & 0x3f) | 0x80);
|
||||||
|
} else {
|
||||||
|
res += char(((ui >> 12) & 0x0f )| 0xe0);
|
||||||
|
res += char(((ui>>6) & 0x3f )| 0x80 );
|
||||||
|
res += char((ui & 0x3f) | 0x80);
|
||||||
|
}
|
||||||
|
begin ++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
template <class Uint16Container>
|
||||||
|
bool GBKTrans(const char* const str, size_t len, Uint16Container& vec) {
|
||||||
|
vec.clear();
|
||||||
|
if(!str) {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
size_t i = 0;
|
||||||
|
while(i < len) {
|
||||||
|
if(0 == (str[i] & 0x80)) {
|
||||||
|
vec.push_back(uint16_t(str[i]));
|
||||||
|
i++;
|
||||||
|
} else {
|
||||||
|
if(i + 1 < len) { //&& (str[i+1] & 0x80))
|
||||||
|
uint16_t tmp = (((uint16_t(str[i]) & 0x00ff ) << 8) | (uint16_t(str[i+1]) & 0x00ff));
|
||||||
|
vec.push_back(tmp);
|
||||||
|
i += 2;
|
||||||
|
} else {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class Uint16Container>
|
||||||
|
bool GBKTrans(const string& str, Uint16Container& vec) {
|
||||||
|
return GBKTrans(str.c_str(), str.size(), vec);
|
||||||
|
}
|
||||||
|
|
||||||
|
template <class Uint16ContainerConIter>
|
||||||
|
void GBKTrans(Uint16ContainerConIter begin, Uint16ContainerConIter end, string& res) {
|
||||||
|
res.clear();
|
||||||
|
//pair<char, char> pa;
|
||||||
|
char first, second;
|
||||||
|
while(begin != end) {
|
||||||
|
//pa = uint16ToChar2(*begin);
|
||||||
|
first = ((*begin)>>8) & 0x00ff;
|
||||||
|
second = (*begin) & 0x00ff;
|
||||||
|
if(first & 0x80) {
|
||||||
|
res += first;
|
||||||
|
res += second;
|
||||||
|
} else {
|
||||||
|
res += second;
|
||||||
|
}
|
||||||
|
begin++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* format example: "%Y-%m-%d %H:%M:%S"
|
||||||
|
*/
|
||||||
|
// inline void GetTime(const string& format, string& timeStr) {
|
||||||
|
// time_t timeNow;
|
||||||
|
// time(&timeNow);
|
||||||
|
// timeStr.resize(64);
|
||||||
|
// size_t len = strftime((char*)timeStr.c_str(), timeStr.size(), format.c_str(), localtime(&timeNow));
|
||||||
|
// timeStr.resize(len);
|
||||||
|
// }
|
||||||
|
|
||||||
|
inline string PathJoin(const string& path1, const string& path2) {
|
||||||
|
if(EndsWith(path1, "/")) {
|
||||||
|
return path1 + path2;
|
||||||
|
}
|
||||||
|
return path1 + "/" + path2;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
#endif
|
|
@ -0,0 +1,44 @@
|
||||||
|
#ifndef LIMONP_THREAD_HPP
|
||||||
|
#define LIMONP_THREAD_HPP
|
||||||
|
|
||||||
|
#include "Logging.hpp"
|
||||||
|
#include "NonCopyable.hpp"
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
|
||||||
|
class IThread: NonCopyable {
|
||||||
|
public:
|
||||||
|
IThread(): isStarted(false), isJoined(false) {
|
||||||
|
}
|
||||||
|
virtual ~IThread() {
|
||||||
|
if(isStarted && !isJoined) {
|
||||||
|
XCHECK(!pthread_detach(thread_));
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
virtual void Run() = 0;
|
||||||
|
void Start() {
|
||||||
|
XCHECK(!isStarted);
|
||||||
|
XCHECK(!pthread_create(&thread_, NULL, Worker, this));
|
||||||
|
isStarted = true;
|
||||||
|
}
|
||||||
|
void Join() {
|
||||||
|
XCHECK(!isJoined);
|
||||||
|
XCHECK(!pthread_join(thread_, NULL));
|
||||||
|
isJoined = true;
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
static void * Worker(void * data) {
|
||||||
|
IThread * ptr = (IThread* ) data;
|
||||||
|
ptr->Run();
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
pthread_t thread_;
|
||||||
|
bool isStarted;
|
||||||
|
bool isJoined;
|
||||||
|
}; // class IThread
|
||||||
|
|
||||||
|
} // namespace limonp
|
||||||
|
|
||||||
|
#endif // LIMONP_THREAD_HPP
|
|
@ -0,0 +1,86 @@
|
||||||
|
#ifndef LIMONP_THREAD_POOL_HPP
|
||||||
|
#define LIMONP_THREAD_POOL_HPP
|
||||||
|
|
||||||
|
#include "Thread.hpp"
|
||||||
|
#include "BlockingQueue.hpp"
|
||||||
|
#include "BoundedBlockingQueue.hpp"
|
||||||
|
#include "Closure.hpp"
|
||||||
|
|
||||||
|
namespace limonp {
|
||||||
|
|
||||||
|
using namespace std;
|
||||||
|
|
||||||
|
//class ThreadPool;
|
||||||
|
class ThreadPool: NonCopyable {
|
||||||
|
public:
|
||||||
|
class Worker: public IThread {
|
||||||
|
public:
|
||||||
|
Worker(ThreadPool* pool): ptThreadPool_(pool) {
|
||||||
|
assert(ptThreadPool_);
|
||||||
|
}
|
||||||
|
virtual ~Worker() {
|
||||||
|
}
|
||||||
|
|
||||||
|
virtual void Run() {
|
||||||
|
while (true) {
|
||||||
|
ClosureInterface* closure = ptThreadPool_->queue_.Pop();
|
||||||
|
if (closure == NULL) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
closure->Run();
|
||||||
|
} catch(std::exception& e) {
|
||||||
|
XLOG(ERROR) << e.what();
|
||||||
|
} catch(...) {
|
||||||
|
XLOG(ERROR) << " unknown exception.";
|
||||||
|
}
|
||||||
|
delete closure;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
private:
|
||||||
|
ThreadPool * ptThreadPool_;
|
||||||
|
}; // class Worker
|
||||||
|
|
||||||
|
ThreadPool(size_t thread_num)
|
||||||
|
: threads_(thread_num),
|
||||||
|
queue_(thread_num) {
|
||||||
|
assert(thread_num);
|
||||||
|
for(size_t i = 0; i < threads_.size(); i ++) {
|
||||||
|
threads_[i] = new Worker(this);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
~ThreadPool() {
|
||||||
|
Stop();
|
||||||
|
}
|
||||||
|
|
||||||
|
void Start() {
|
||||||
|
for(size_t i = 0; i < threads_.size(); i++) {
|
||||||
|
threads_[i]->Start();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
void Stop() {
|
||||||
|
for(size_t i = 0; i < threads_.size(); i ++) {
|
||||||
|
queue_.Push(NULL);
|
||||||
|
}
|
||||||
|
for(size_t i = 0; i < threads_.size(); i ++) {
|
||||||
|
threads_[i]->Join();
|
||||||
|
delete threads_[i];
|
||||||
|
}
|
||||||
|
threads_.clear();
|
||||||
|
}
|
||||||
|
|
||||||
|
void Add(ClosureInterface* task) {
|
||||||
|
assert(task);
|
||||||
|
queue_.Push(task);
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
friend class Worker;
|
||||||
|
|
||||||
|
vector<IThread*> threads_;
|
||||||
|
BoundedBlockingQueue<ClosureInterface*> queue_;
|
||||||
|
}; // class ThreadPool
|
||||||
|
|
||||||
|
} // namespace limonp
|
||||||
|
|
||||||
|
#endif // LIMONP_THREAD_POOL_HPP
|
|
@ -0,0 +1,22 @@
|
||||||
|
INCLUDEPATH += $$PWD
|
||||||
|
|
||||||
|
HEADERS += \
|
||||||
|
$$PWD/ArgvContext.hpp \
|
||||||
|
$$PWD/BlockingQueue.hpp \
|
||||||
|
$$PWD/BoundedBlockingQueue.hpp \
|
||||||
|
$$PWD/BoundedQueue.hpp \
|
||||||
|
$$PWD/Closure.hpp \
|
||||||
|
$$PWD/Colors.hpp \
|
||||||
|
$$PWD/Condition.hpp \
|
||||||
|
$$PWD/Config.hpp \
|
||||||
|
$$PWD/FileLock.hpp \
|
||||||
|
$$PWD/ForcePublic.hpp \
|
||||||
|
$$PWD/LocalVector.hpp \
|
||||||
|
$$PWD/Logging.hpp \
|
||||||
|
$$PWD/Md5.hpp \
|
||||||
|
$$PWD/MutexLock.hpp \
|
||||||
|
$$PWD/NonCopyable.hpp \
|
||||||
|
$$PWD/StdExtension.hpp \
|
||||||
|
$$PWD/StringUtil.hpp \
|
||||||
|
$$PWD/Thread.hpp \
|
||||||
|
$$PWD/ThreadPool.hpp
|
|
@ -0,0 +1,276 @@
|
||||||
|
/*
|
||||||
|
* Copyright (C) 2022, KylinSoft Co., Ltd.
|
||||||
|
*
|
||||||
|
* This program is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This program is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||||
|
*
|
||||||
|
* Authors: jixiaoxu <jixiaoxu@kylinos.cn>
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
#include <cmath>
|
||||||
|
#include "segment-trie.h"
|
||||||
|
|
||||||
|
DictTrie::DictTrie(const vector<string> file_paths, string dat_cache_path)
|
||||||
|
: StorageBase<DatMemElem, false, DictCacheFileHeader>(file_paths, dat_cache_path)
|
||||||
|
{
|
||||||
|
this->Init();
|
||||||
|
}
|
||||||
|
|
||||||
|
DictTrie::DictTrie(const string &dict_path, const string &user_dict_paths, const string &dat_cache_path)
|
||||||
|
: StorageBase<DatMemElem, false, DictCacheFileHeader>(vector<string>{dict_path, user_dict_paths}, dat_cache_path)
|
||||||
|
{
|
||||||
|
this->Init();
|
||||||
|
}
|
||||||
|
|
||||||
|
void DictTrie::LoadSourceFile(const string &dat_cache_file, const string &md5)
|
||||||
|
{
|
||||||
|
DictCacheFileHeader header;
|
||||||
|
assert(sizeof(header.md5_hex) == md5.size());
|
||||||
|
memcpy(&header.md5_hex[0], md5.c_str(), md5.size());
|
||||||
|
|
||||||
|
int offset(0), elements_num(0), write_bytes(0), data_trie_size(0);
|
||||||
|
string tmp_filepath = string(dat_cache_file) + "_XXXXXX";
|
||||||
|
umask(S_IWGRP | S_IWOTH);
|
||||||
|
const int fd =mkstemp((char *)tmp_filepath.data());
|
||||||
|
assert(fd >= 0);
|
||||||
|
fchmod(fd, 0644);
|
||||||
|
|
||||||
|
write_bytes = write(fd, (const char *)&header, sizeof(DictCacheFileHeader));
|
||||||
|
|
||||||
|
this->PreLoad();
|
||||||
|
this->LoadDefaultDict(fd, write_bytes, offset, elements_num);
|
||||||
|
this->LoadUserDict(fd, write_bytes, offset, elements_num);
|
||||||
|
|
||||||
|
write_bytes += write(fd, this->GetDataTrieArray(), this->GetDataTrieTotalSize());
|
||||||
|
|
||||||
|
lseek(fd, sizeof(header.md5_hex), SEEK_SET);
|
||||||
|
write(fd, &elements_num, sizeof(int));
|
||||||
|
write(fd, &offset, sizeof(int));
|
||||||
|
data_trie_size = this->GetDataTrieSize();
|
||||||
|
write(fd, &data_trie_size, sizeof(int));
|
||||||
|
write(fd, &m_min_weight, sizeof(double));
|
||||||
|
|
||||||
|
close(fd);
|
||||||
|
assert((size_t)write_bytes == sizeof(DictCacheFileHeader) + offset + this->GetDataTrieTotalSize());
|
||||||
|
|
||||||
|
const auto rename_ret = rename(tmp_filepath.c_str(), dat_cache_file.c_str());
|
||||||
|
assert(0 == rename_ret);
|
||||||
|
}
|
||||||
|
|
||||||
|
const DatMemElem * DictTrie::Find(const string &key) const
|
||||||
|
{
|
||||||
|
int result = this->ExactMatchSearch(key.c_str(), key.size());
|
||||||
|
if (result < 0)
|
||||||
|
return nullptr;
|
||||||
|
return &this->GetElementPtr()[result];
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
void DictTrie::FindDatDag(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<DatDag> &res, size_t max_word_len) const {
|
||||||
|
|
||||||
|
res.clear();
|
||||||
|
res.resize(end - begin);
|
||||||
|
|
||||||
|
string text_str;
|
||||||
|
EncodeRunesToString(begin, end, text_str);
|
||||||
|
|
||||||
|
static const size_t max_num = 128;
|
||||||
|
result_pair_type result_pairs[max_num] = {};
|
||||||
|
|
||||||
|
for (size_t i = 0, begin_pos = 0; i < size_t(end - begin); i++) {
|
||||||
|
|
||||||
|
std::size_t num_results = this->CommonPrefixSearch(&text_str[begin_pos], &result_pairs[0], max_num);
|
||||||
|
|
||||||
|
res[i].nexts.push_back(pair<size_t, const DatMemElem *>(i + 1, nullptr));
|
||||||
|
|
||||||
|
for (std::size_t idx = 0; idx < num_results; ++idx) {
|
||||||
|
auto & match = result_pairs[idx];
|
||||||
|
|
||||||
|
if ((match.value < 0) || ((size_t)match.value >= this->GetCacheFileHeaderPtr()->elements_size)) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
auto const char_num = Utf8CharNum(&text_str[begin_pos], match.length);
|
||||||
|
|
||||||
|
if (char_num > max_word_len) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
const DatMemElem * pValue = &this->GetElementPtr()[match.value];
|
||||||
|
|
||||||
|
if (1 == char_num) {
|
||||||
|
res[i].nexts[0].second = pValue;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
res[i].nexts.push_back(pair<size_t, const DatMemElem *>(i + char_num, pValue));
|
||||||
|
}
|
||||||
|
|
||||||
|
begin_pos += limonp::UnicodeToUtf8Bytes((begin + i)->rune);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void DictTrie::FindWordRange(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end, vector<WordRange> &words, size_t max_word_len) const {
|
||||||
|
|
||||||
|
string text_str;
|
||||||
|
EncodeRunesToString(begin, end, text_str);
|
||||||
|
|
||||||
|
static const size_t max_num = 128;
|
||||||
|
result_pair_type result_pairs[max_num] = {};//存放字典查询结果
|
||||||
|
size_t str_size = end - begin;
|
||||||
|
double max_weight[str_size];//存放逆向路径最大weight
|
||||||
|
for (size_t i = 0; i<str_size; i++) {
|
||||||
|
max_weight[i] = -3.14e+100;
|
||||||
|
}
|
||||||
|
size_t max_next[str_size];//存放动态规划后的分词结果
|
||||||
|
//memset(max_next,-1,str_size*sizeof(size_t));
|
||||||
|
|
||||||
|
double val(0);
|
||||||
|
for (size_t i = 0, begin_pos = text_str.size(); i < str_size; i++) {
|
||||||
|
size_t nextPos = str_size - i;//逆向计算
|
||||||
|
begin_pos -= (end - i - 1)->len;
|
||||||
|
|
||||||
|
std::size_t num_results = this->CommonPrefixSearch(&text_str[begin_pos], &result_pairs[0], max_num);
|
||||||
|
if (0 == num_results) {//字典不存在则单独分词
|
||||||
|
val = GetMinWeight();
|
||||||
|
if (nextPos < str_size) {
|
||||||
|
val += max_weight[nextPos];
|
||||||
|
}
|
||||||
|
if ((nextPos <= str_size) && (val > max_weight[nextPos - 1])) {
|
||||||
|
max_weight[nextPos - 1] = val;
|
||||||
|
max_next[nextPos - 1] = nextPos;
|
||||||
|
}
|
||||||
|
} else {//字典存在则根据查询结果数量计算最大概率路径
|
||||||
|
for (std::size_t idx = 0; idx < num_results; ++idx) {
|
||||||
|
auto & match = result_pairs[idx];
|
||||||
|
if ((match.value < 0) || ((uint32_t)match.value >= this->GetCacheFileHeaderPtr()->elements_size)) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
auto const char_num = Utf8CharNum(&text_str[begin_pos], match.length);
|
||||||
|
if (char_num > max_word_len) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
auto * pValue = &this->GetElementPtr()[match.value];
|
||||||
|
|
||||||
|
val = pValue->weight;
|
||||||
|
if (1 == char_num) {
|
||||||
|
if (nextPos < str_size) {
|
||||||
|
val += max_weight[nextPos];
|
||||||
|
}
|
||||||
|
if ((nextPos <= str_size) && (val > max_weight[nextPos - 1])) {
|
||||||
|
max_weight[nextPos - 1] = val;
|
||||||
|
max_next[nextPos - 1] = nextPos;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
if (nextPos - 1 + char_num < str_size) {
|
||||||
|
val += max_weight[nextPos - 1 + char_num];
|
||||||
|
}
|
||||||
|
if ((nextPos - 1 + char_num <= str_size) && (val > max_weight[nextPos - 1])) {
|
||||||
|
max_weight[nextPos - 1] = val;
|
||||||
|
max_next[nextPos - 1] = nextPos - 1 + char_num;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
for (size_t i = 0; i < str_size;) {//统计动态规划结果
|
||||||
|
assert(max_next[i] > i);
|
||||||
|
assert(max_next[i] <= str_size);
|
||||||
|
WordRange wr(begin + i, begin + max_next[i] - 1);
|
||||||
|
words.push_back(wr);
|
||||||
|
i = max_next[i];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
bool DictTrie::IsUserDictSingleChineseWord(const Rune &word) const {
|
||||||
|
return IsIn(m_user_dict_single_chinese_word, word);
|
||||||
|
}
|
||||||
|
|
||||||
|
void DictTrie::PreLoad()
|
||||||
|
{
|
||||||
|
ifstream ifs(DICT_PATH);
|
||||||
|
string line;
|
||||||
|
vector<string> buf;
|
||||||
|
|
||||||
|
for (; getline(ifs, line);) {
|
||||||
|
if (limonp::StartsWith(line, "#") or line.empty()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
limonp::Split(line, buf, " ");
|
||||||
|
if (buf.size() != 3)
|
||||||
|
continue;
|
||||||
|
m_freq_sum += atof(buf[1].c_str());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void DictTrie::LoadDefaultDict(const int &fd, int &write_bytes, int &offset, int &elements_num)
|
||||||
|
{
|
||||||
|
ifstream ifs(DICT_PATH);
|
||||||
|
string line;
|
||||||
|
vector<string> buf;
|
||||||
|
|
||||||
|
for (; getline(ifs, line);) {
|
||||||
|
if (limonp::StartsWith(line, "#") or line.empty()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
limonp::Split(line, buf, " ");
|
||||||
|
if (buf.size() != 3)
|
||||||
|
continue;
|
||||||
|
DatMemElem node_info;
|
||||||
|
node_info.weight = log(atof(buf[1].c_str()) / m_freq_sum);
|
||||||
|
node_info.SetTag(buf[2]);
|
||||||
|
this->Update(buf[0].c_str(), buf[0].size(), elements_num);
|
||||||
|
offset += (sizeof(DatMemElem));
|
||||||
|
elements_num++;
|
||||||
|
if (m_min_weight > node_info.weight) {
|
||||||
|
m_min_weight = node_info.weight;
|
||||||
|
}
|
||||||
|
write_bytes += write(fd, &node_info, sizeof(DatMemElem));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void DictTrie::LoadUserDict(const int &fd, int &write_bytes, int &offset, int &elements_num)
|
||||||
|
{
|
||||||
|
ifstream ifs(USER_DICT_PATH);
|
||||||
|
string line;
|
||||||
|
vector<string> buf;
|
||||||
|
for (; getline(ifs, line);) {
|
||||||
|
if (limonp::StartsWith(line, "#") or line.empty()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
limonp::Split(line, buf, " ");
|
||||||
|
if (buf.size() != 3)
|
||||||
|
continue;
|
||||||
|
DatMemElem node_info;
|
||||||
|
assert(m_freq_sum > 0.0);
|
||||||
|
const int freq = atoi(buf[1].c_str());
|
||||||
|
node_info.weight = log(1.0 * freq / m_freq_sum);
|
||||||
|
node_info.SetTag(buf[2]);
|
||||||
|
this->Update(buf[0].c_str(), buf[0].size(), elements_num);
|
||||||
|
offset += (sizeof(DatMemElem));
|
||||||
|
elements_num++;
|
||||||
|
write_bytes += write(fd, &node_info, sizeof(DatMemElem));
|
||||||
|
if (Utf8CharNum(buf[0]) == 1) {
|
||||||
|
RuneArray word;
|
||||||
|
if (DecodeRunesInString(buf[0], word)) {
|
||||||
|
m_user_dict_single_chinese_word.insert(word[0]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
inline double DictTrie::GetMinWeight() const
|
||||||
|
{
|
||||||
|
return this->GetCacheFileHeaderPtr()->min_weight;
|
||||||
|
}
|
|
@ -0,0 +1,62 @@
|
||||||
|
/*
|
||||||
|
* Copyright (C) 2022, KylinSoft Co., Ltd.
|
||||||
|
*
|
||||||
|
* This program is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This program is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||||
|
*
|
||||||
|
* Authors: jixiaoxu <jixiaoxu@kylinos.cn>
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
#ifndef SegmentTrie_H
|
||||||
|
#define SegmentTrie_H
|
||||||
|
|
||||||
|
#include "storage-base.hpp"
|
||||||
|
#include "cppjieba/Unicode.hpp"
|
||||||
|
|
||||||
|
using namespace cppjieba;
|
||||||
|
|
||||||
|
const char * const DICT_PATH = "/usr/share/ukui-search/res/dict/jieba.dict.utf8";
|
||||||
|
const char * const USER_DICT_PATH = "/usr/share/ukui-search/res/dict/user.dict.utf8";
|
||||||
|
|
||||||
|
struct DictCacheFileHeader : CacheFileHeaderBase
|
||||||
|
{
|
||||||
|
double min_weight = 0;
|
||||||
|
};
|
||||||
|
|
||||||
|
class DictTrie : public StorageBase<DatMemElem, false, DictCacheFileHeader>
|
||||||
|
{
|
||||||
|
public:
|
||||||
|
DictTrie(const vector<string> file_paths, string dat_cache_path = "");
|
||||||
|
DictTrie(const string& dict_path, const string& user_dict_paths = "", const string & dat_cache_path = "");
|
||||||
|
void LoadSourceFile(const string &dat_cache_file, const string &md5) override;
|
||||||
|
|
||||||
|
const DatMemElem *Find(const string &key) const;
|
||||||
|
void FindDatDag(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end,
|
||||||
|
vector<struct DatDag>&res, size_t max_word_len = MAX_WORD_LENGTH) const;
|
||||||
|
void FindWordRange(RuneStrArray::const_iterator begin, RuneStrArray::const_iterator end,
|
||||||
|
vector<WordRange>& words, size_t max_word_len = MAX_WORD_LENGTH) const;
|
||||||
|
bool IsUserDictSingleChineseWord(const Rune& word) const;
|
||||||
|
|
||||||
|
private:
|
||||||
|
DictTrie();
|
||||||
|
void PreLoad();
|
||||||
|
void LoadDefaultDict(const int &fd, int &write_bytes, int &offset, int &elements_num);
|
||||||
|
void LoadUserDict(const int &fd, int &write_bytes, int &offset, int &elements_num);
|
||||||
|
double GetMinWeight() const;
|
||||||
|
|
||||||
|
double m_freq_sum = 0.0;
|
||||||
|
double m_min_weight = 3.14e+100;
|
||||||
|
unordered_set<Rune> m_user_dict_single_chinese_word;
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif // SegmentTrie_H
|
|
@ -0,0 +1 @@
|
||||||
|
#include "chinese-segmentation.h"
|
|
@ -0,0 +1 @@
|
||||||
|
#include "hanzi-to-pinyin.h"
|
|
@ -0,0 +1,31 @@
|
||||||
|
# CppJieba字典
|
||||||
|
|
||||||
|
文件后缀名代表的是词典的编码方式。
|
||||||
|
比如filename.utf8 是 utf8编码,filename.gbk 是 gbk编码方式。
|
||||||
|
|
||||||
|
|
||||||
|
## 分词
|
||||||
|
|
||||||
|
### jieba.dict.utf8/gbk
|
||||||
|
|
||||||
|
作为最大概率法(MPSegment: Max Probability)分词所使用的词典。
|
||||||
|
|
||||||
|
### hmm_model.utf8/gbk
|
||||||
|
|
||||||
|
作为隐式马尔科夫模型(HMMSegment: Hidden Markov Model)分词所使用的词典。
|
||||||
|
|
||||||
|
__对于MixSegment(混合MPSegment和HMMSegment两者)则同时使用以上两个词典__
|
||||||
|
|
||||||
|
|
||||||
|
## 关键词抽取
|
||||||
|
|
||||||
|
### idf.utf8
|
||||||
|
|
||||||
|
IDF(Inverse Document Frequency)
|
||||||
|
在KeywordExtractor中,使用的是经典的TF-IDF算法,所以需要这么一个词典提供IDF信息。
|
||||||
|
|
||||||
|
### stop_words.utf8
|
||||||
|
|
||||||
|
停用词词典
|
||||||
|
|
||||||
|
|
File diff suppressed because one or more lines are too long
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because one or more lines are too long
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue