Open Nav






Thanks to the development of artificial intelligence-related theory and technology, the concept of “X+AI” has swept across all walks of life and has had a profound impact on the reform and development of traditional fields. Legal artificial intelligence is one of the emerging cross-cutting areas. In the context of the traditional legal service industry, which is difficult to meet the growing legal demands due to shortage of manpower and complicated work content, legal artificial intelligence has become an ideal solution due to its low cost and high efficiency.Itis worth studying and has a broad future of application.One of the important research branches is to predict the accusation of the case. However, the case data in the real-world scenario, due to the nature of the crime and the difference in sentencing considerations, there exists an obviousproblem of data imbalance in the labels of data, which makes it difficult to accurately predict the accusations and identify those rare labels.


In view of the above problems, we study the accusation prediction method based on the judgment documents, and augments the original datasetin the preprocessing stage of the data layer to reduce the data imbalance level. A data augmentation method based on synonym substitution in data space is adopted in this paper. The algorithm of using the synonym dictionary, training local word vector and introducing pre-trained word vector to perform over-sampling of minority samples is implemented and integrated. The data augmentation algorithm SMOTE is selected for comparison.
In terms of classification model, we utilizeFastText model and SVM model to verify the augmented dataset respectively. The data augmentation methods are compared and analyzed in the aspects of data augmentation algorithm’s running time, score achieved by theclassifier, training time and prediction time of the model, to find out how much improvement is made by these methods. The experimental results show that the synonymous substitutionalgorithm based on local-trained word vector fordata augmentation, combined with FastText model, has achieved thebest overall performance in terms of operational efficiency and classification effect.

Key Words:accusation predictions;data augmentation; SMOTE;FastText; SVM
摘要    I
Abstract    II
第1章绪论    1
1.1 选题背景与研究意义    1
1.2 国内外研究现状    2
1.3 研究目标和研究内容    2
1.4 论文结构安排    3
第2章裁判文书数据集介绍及预处理    4
2.1 数据集介绍    4
2.2 数据预处理    5
2.2.1 数据清洗    5
2.2.2 中文文本分词及去停用词    6
2.2.3 数据分析    6
2.3 实验评价指标    7
2.4 本章小结    9
第3章不均衡数据的增强方法研究    10
3.1数据不均衡问题的介绍    10
3.2数据空间中的数据增强方法    10
3.2.1基于同义词典的数据替换    11
3.2.2基于词向量相似度的数据替换    12

3.3特征空间中的数据增强方法    15
3.3.1文本的向量化表示    15
3.3.2 SMOTE算法    16
3.4 增强后数据集统计分析    16
3.5本章小结    18
第4章基于裁判文书的罪名预测实验对比分析    19
4.1 中文文本分类的基本过程    19
4.2实验模型介绍    20
4.2.1 SVM模型    20
4.4.2 FastText模型    21
4.3 实验过程与分类结果分析    22
4.3.1实验环境    22
4.3.2实验方案    22
4.3.3实验结果分析    22
4.4本章小结    25
第5章总结与展望    26
5.1 工作总结    26
5.2 未来工作展望    26
参考文献    28
致谢    30 [资料来源]

  • 关于资料
  • 如何下载
  • 疑难帮助
  • 关于服务
  • 资料仅供参考和学习交流之用,请勿做其他非法用途,转载必究,如有侵犯您的权利或有损您的利益,请联系本站,经查实我们会立即进行修正! 版权所有,严禁转载 Copyright © 2012-2025 苏ICP备2021029856号-4