TY - GEN
T1 - A distributed pipeline for DIDSON data processing
AU - Li, Liling
AU - Danner, Tyler
AU - Eickholt, Jesse
AU - McCann, Erin
AU - Pangle, Kevin
AU - Johnson, Nicholas
N1 - Funding Information:
The authors would like to acknowledge Michigan Sea Grant, which supported Erin McCann during her time at Central Michigan University. The authors would also like to thank Emmaleigh Wilson for her work in prototyping some image filtering and target tracking algorithms. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S Government. We thank the Great Lakes Fishery Commission for funding the collection of the DIDSON data.
Funding Information:
Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S Government. We thank the Great Lakes Fishery Commission for funding the collection of the DIDSON data.
Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/1
Y1 - 2017/7/1
N2 - Technological advances in the field of ecology allow data on ecological systems to be collected at high resolution, both temporally and spatially. Devices such as Dual-frequency Identification Sonar (DIDSON) can be deployed in aquatic environments for extended periods and easily generate several terabytes of underwater surveillance data which may need to be processed multiple times. Due to the large amount of data generated and need for flexibility in processing, a distributed pipeline was constructed for DIDSON data making use of the Hadoop ecosystem. The pipeline is capable of ingesting raw DIDSON data, transforming the acoustic data to images, filtering the images, detecting and extracting motion, and generating feature data for machine learning and classification. All of the tasks in the pipeline can be run in parallel and the framework allows for custom processing. Applications of the pipeline include monitoring migration times, determining the presence of a particular species, estimating population size and other fishery management tasks.
AB - Technological advances in the field of ecology allow data on ecological systems to be collected at high resolution, both temporally and spatially. Devices such as Dual-frequency Identification Sonar (DIDSON) can be deployed in aquatic environments for extended periods and easily generate several terabytes of underwater surveillance data which may need to be processed multiple times. Due to the large amount of data generated and need for flexibility in processing, a distributed pipeline was constructed for DIDSON data making use of the Hadoop ecosystem. The pipeline is capable of ingesting raw DIDSON data, transforming the acoustic data to images, filtering the images, detecting and extracting motion, and generating feature data for machine learning and classification. All of the tasks in the pipeline can be run in parallel and the framework allows for custom processing. Applications of the pipeline include monitoring migration times, determining the presence of a particular species, estimating population size and other fishery management tasks.
KW - DIDSON
KW - HDFS
KW - classification
KW - distributed processing
KW - surveillance
UR - http://dx.doi.org/10.1109/bigdata.2017.8258458
U2 - 10.1109/BigData.2017.8258458
DO - 10.1109/BigData.2017.8258458
M3 - Conference contribution
AN - SCOPUS:85047749050
SN - 9781538627150
T3 - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
SP - 4301
EP - 4306
BT - Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
A2 - Nie, Jian-Yun
A2 - Obradovic, Zoran
A2 - Suzumura, Toyotaro
A2 - Ghosh, Rumi
A2 - Nambiar, Raghunath
A2 - Wang, Chonggang
A2 - Zang, Hui
A2 - Baeza-Yates, Ricardo
A2 - Baeza-Yates, Ricardo
A2 - Hu, Xiaohua
A2 - Kepner, Jeremy
A2 - Cuzzocrea, Alfredo
A2 - Tang, Jian
A2 - Toyoda, Masashi
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 11 December 2017 through 14 December 2017
ER -