文本表示模型

Installing python3-dbg and python3-dev:

$ sudo apt install python3-dbg python3-dev

python3-dbg package comes with short documentation how to use it in /usr/share/doc/python3-dbg/README.debug which I will use in the next step.

Appending unpacked GDB helper script /usr/share/doc/python3.5/gdbinit.gz to ~/.gdbinit:

zcat /usr/share/doc/python3.5/gdbinit.gz >> ~/.gdbinit

刚开始参考这个How can I get python stack trace information using GDB?
结果libpython 之类的都找不到，py-bt找不到

[gdb的python接口]（https://segmentfault.com/a/1190000005718889）

<如何在virtualenv中使用gdb python调试扩展>
我已经通过在gdb上使用strace解决了这个问题，并打开了“open”系统调用。 >

似乎gdb在它猜测的几个路径（根据python二进制文件）中搜索python-gdb.py，并且只要没有找到该文件，它就会失败。

最终解决问题的方法是将 /usr/lib/debug/usr/bin/python2.7-gdb.py 放入env的bin目录中。链接的名称应该是< python二进制名称> -gdb.py ，在我的情况下 python2.7-dbg-gdb.py （…）

如何修改core file的文件名格式和保存位置
/proc/sys/kernel /core_pattern可以控制core文件保存位置和文件名格式。可通过以下命令修改此文件：

echo “/corefile/core-%e-%p-%t” >/proc/sys/kernel /core_pattern

可以将core文件统一生成到/corefile目录下，产生的文件名为core-命令名-pid-时间戳
以下是参数列表:
%p - insert pid into filename 添加pid
%u - insert current uid into filename 添加当前uid
%g - insert current gid into filename 添加当前gid
%s - insert signal that caused the coredump into the filename 添加导致产生core的信号
%t - insert UNIX time that the coredump occurred into filename 添加core文件生成时的unix时间
%h - insert hostname where the coredump happened into filename 添加主机名
%e - insert coredumping executable name into filename 添加命令名

官方教程中文英文

How to change the Python Interpreter that gdb uses?


$ apt-get -qq update
$ apt-get install gdb python2.7-dbg python3-all-dbg
$ gdb -ex r -ex quit --args python2 -c "import sys ; print(sys.version)" # Py2.7
$ gdb -ex r -ex quit --args python3 -c "import sys ; print(sys.version)" # Py3.6

Debugging of CPython processes with gdb

2018-06-01 centos 7.x 版本下用gdb 调试 python3.6.3 解释器
 使用 gdb 调试运行中的 Python 进程

载入libpython脚本

如果你的gdb是redhat或fedora等厂商修改过的，会有--python选项，使用此选项即可指定gdb启动时载入的Python扩展脚本（此脚本是扩展gdb的，不是我们需要debug的脚本）。

$ gdb --python /path/to/libpython .py -p 1000
如果安装的是GNU的gdb，就需要打开gdb后手动载入libpython.py脚本

(gdb) python
> import sys
>sys.path.insert(0, '/path/to/libpython.py' )
> import libpython
>end
(gdb)

下载libpython Debugging Python C/C++ extensions in gdb

一个可视化的工具GDB dashboard

Python-Debugging
gdb调试python运行中的进程:在stackoverflow一个帖子上找到了灵感，其实就是用含有调试符号的python来运行脚本，也就是用python-dbg或者python2.7-dbg来运行py脚本程序

gdb 与 Python 集成

tensorflow 报错

发表于 2019-08-06 | 更新于 2019-08-07 | 阅读次数

字数统计 432 字 | 阅读时长 2 分钟

安装tensorflow 警告信息

numpy 版本改为1.15.4

import tensorflow as tf
~/.local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
~/.local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
~/.local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
~/.local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
~/.local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
~/.local/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

网址：https://github.com/tensorflow/tensorflow/issues/31249

核心已转储：（Segmentation fault(core dumped)）

core 线圈，代表内存

下标越界，访问了不可访问或不存在的内容

linux下应用 ulimit -a 命令查看当前栈空间大小

ulimit -s 102400 将栈的大小改为100M

或者 ~/.bashrc
~/.bashrc　的最后加入： ulimit -c unlimited （一劳永逸）
ulimit -s unlimited

https://www.xuebuyuan.com/3235632.html
https://blog.csdn.net/qq_39759656/article/details/82858101

python 安装 https://askubuntu.com/questions/175412/how-can-i-get-python-stack-trace-information-using-gdb
python gdb https://wiki.python.org/moin/DebuggingWithGdb

图社交网络数据集

发表于 2019-08-04 | 更新于 2019-08-09 | 阅读次数

字数统计 50 字 | 阅读时长 1 分钟

异构数据集

HGAE： DBLP，IMDB，ACM

唐杰引文数据：https://www.aminer.cn/citation

zhihu

同构

##twitter 数据集处理

On retweets, replies, quotes & favorites: A guide for researchers
Twitter开放API文档

评审意见

发表于 2019-08-03 | 更新于 2019-08-03 | 阅读次数

字数统计 30 字 | 阅读时长 1 分钟

千辛万苦写好的论文，却得到这样的评审意见……
如何组织评审意见回复 RESPONSES ？

比赛

发表于 2019-08-03 | 更新于 2019-08-09 | 阅读次数

字数统计 64 字 | 阅读时长 1 分钟

流行度预测。

网址链接：http://www.acmmm.org/2017/challenge/

【ACMMM17获奖比赛论文报告】让机器告诉你谁是下一个明星？- Social Media Prediction分享（附下载）

2016CCF-SouGou-大数据精准营销下的用户画像精准识别

聚类算法总结

发表于 2019-08-03 | 更新于 2019-08-09 | 阅读次数

字数统计 71 字 | 阅读时长 1 分钟

kmeans

sklearn 文档: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

sklearn.cluster常用API介绍(KMeans,MiniBatchKMeans)
KMeans聚类后的聚类点(scikit learn)

聚类–K均值算法：自主实现与sklearn.cluster.KMeans调用

主题分类

主题分类参考目录网站：dmoztool http://dmoztools.net/Society/Philosophy/Aesthetics/

知识图谱

发表于 2019-08-03 | 更新于 2019-08-09 | 阅读次数

字数统计 93 字 | 阅读时长 1 分钟

同一人博客

transE评价方法
 transE的TensorFlow实现

论文集

知识图谱论文大合集，干货满满的笔记解读

【知识图谱】——8种Trans模型

知识图谱表示学习与关系推理（2016-2017）
基于翻译模型(Trans系列)的知识表示学习

基于知识图谱的表示学习——Trans系列算法介绍（一）

图卷积神经网络

发表于 2019-08-03 | 更新于 2019-11-05 | 阅读次数

字数统计 506 字 | 阅读时长 2 分钟

论文资料

How powerful is the graph neural network

2017CIKM－Network Embedding专题论文分享

AAAI2019《图表示学习》Tutorial, 180 页 PPT 带你从入门到精通（下载）
Representation Learning on Network 网络表示学习笔记

Github项目推荐 | 图神经网络(GNN)相关资源大列表

github

https://github.com/nnzhan/Awesome-Graph-Neural-Networks
图卷积网络,图的注意力模型，图的自动编码器，图生成网络，图时空网络，各种应用

##Inductive Representation Learning On Large Graphs

Inductive Representation Learning On Large Graphs【阅读笔记】

GraphSAGE 代码解析(一) - unsupervised_train.py

异构网络

PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Network

GitHub上一份Graph Embedding相关的论文列表，很有价值的参考

GCN

图卷积网络到底怎么做，这是一份极简的Numpy实现

CANE

CANE ：
对于每条边，根据节点的出度随机采样节点，然后在选出不是自环的’一个’负样本点，组成训练集，训练集由若干个三元组组成。

GATNE：
基于所有的边进行随机游走，每个节点重复进行采样N次，每次采样L个负样本，这样相当于生成一个句子。
基于采样出的顶点数，统计每个顶点的频次，依据频次排序，在根据skip-windows针对每个句子生成训练样本顶点对。
训练样本中的index是根据频次进行转换后的。如index2word:[3,4,1]表示1索引对应的节点是3，train_input 按词频编号，TF的word2vec实现里，词频越大，词的类别编号也就越小。因此，在TF的word2vec里，负采样的过程其实就是优先采词频高的词作为负样本。

log_uniform_candidate_sampler是怎么采样的呢？他的实现在这里：

1、会在[0, range_max)中采样出一个整数k

2、P(k) = (log(k + 2) - log(k + 1)) / log(range_max + 1)

可以看到，k越大，被采样到的概率越小。

参考文献：https://blog.csdn.net/qq_36092251/article/details/79684721（采样编号）

NLP

Graph Convolution Network for NLP

行业人物：

dblp:Chengqi Zhang