Tuesday, October 7, 2014

DATASETS


DATASETS


http://fr.nomao.com/labs/datasets

Nomao datasets release

Chief Scientist at Nomao
Nomao releases 7 datasets for teaching and research purposes.
These datasets can be used in several research areas such as: machine learning, data integration, natural language processing, data visualisation, directed network analysis, recommender systems, information retrieval, etc. Some of them have already been used in challenges, research projects and published research works.

Dataset #1: data duplicates
Dataset #2: online reviews
Dataset #3: local search queries
Dataset #4: learning-to-rank
Dataset #5: text generation
Dataset #6: recommendation
Dataset #7: votings network

For more information, visit Nomao Labs web page http://fr.nomao.com/labs/datasetsor contact Estelle Delpech (estelle at nomao.com). 

Dataset of intrusion detection with SYSLOG FILE for training neural network.

http://stackoverflow.com/questions/8819453/dataset-of-intrusion-detection-with-syslog-file-for-train-neural-network
There is actually a new intrusion detection dataset with labeled data and full packet capture called the ISCX 2012 dataset available for download via: http://www.iscx.ca/dataset. You just need to fill in the form and submit a request. Since the data is labeled, you can easily use it for training and testing your neural network or any other type of algorithm. The dataset contains over 80GByte of data in pcap format captured over a 7 day span with multiple attacks, as well as normal traffic. Check the details at http://www.iscx.ca/dataset
The DARPA dataset and its derivate, the KDD 99 dataset, are very outdated.
There are 3 days of traffic with normal network activity than can be used for training purposes and 4 days of network activity that includes complex multi-step attacks, each performed on a separate day. These 4 days can be used for testing purposes. Each day of network activity is captured on a separate pcap file for easier analysis.
Friday: Normal Activity. No malicious activitySaturday: Normal Activity. No malicious activitySunday: Infiltrating the network from inside + Normal ActivityMonday: HTTP Denial of Service + Normal ActivityTuesday: Distributed Denial of Service using an IRC BotnetWednesday: Normal Activity. No malicious activityThursday: Brute Force SSH + Normal Activity

ISCX - information security centre for excellence http://www.iscx.ca/index.php
The Information Security Centre of Excellence (ISCX) 2012 intrusion detection evaluation dataset consists of labeled network traces, including full packet payloads, which along with the relevant profiles are publicly available to researchers by applying at http://iscx.ca/dataset-request-form. A full description of the evaluation dataset can also be found at http://www.iscx.ca/datasets.


Datasets for Neural Network Training


http://deeplearning.net/
http://deeplearning.net/datasets/

Various Datasets
yahoo released datasets - in terabytes
http://webscope.sandbox.yahoo.com/catalog.php?datatype=r&did=75





No comments:

Post a Comment